ABSTRACT OF THE DISSERTATION
Virtually all video networking applications suffer from transmission errors. The problem is more profound for the case of traditional predictive video coding scheme where the prediction loop propagates errors through its recursive structure. Yet, this traditional scheme is left intact in most error resilient video encoding frameworks.
In our work, we revise the conventional paradigm and tackle the problem of how to optimize the overall performance given the chance to redesign the prediction mechanisms of both encoder and decoder. We propose a novel LP-DPCM framework that we name Bimodal Leaky Prediction in which the decoder operates in one of the two modes depending on channel success or loss and adapt this framework into the motion compensated prediction based video coding scheme.
We model the incoming signal as a first-order Markov model and use the high resolution quantization analysis. Unlike the previous leaky prediction based schemes, the mismatch between the encoder and the decoder is inherently taken into ac- count by explicitly considering the two different reconstruction modes.
The superior performance of Bimodal Leaky Prediction is demonstrated against the standard predictive coding and the traditional leaky prediction based coding in environments with moderate to heavy packet loss. The strength of the scheme mainly stems from the fact that it explicitly takes into account the two modes of operation (packet loss or no packet loss) at the decoder and optimizes the corresponding reconstruction filters together with the prediction filter at the encoder simultaneously.
BIMODAL LEAKY PREDICTION
In these situations it is highly advantageous to transmit only the residual. With efficient prediction, residual variance would be smaller than source variance, resulting in better R-D performance by allowing a quantizer with smaller decision regions and hence higher SNR. As shown in Figure 2.1, a DPCM encoder-decoder pair is characterized by a predictor and a quantizer. The predictor uses a linear combination of past sample reconstructions.
While the DPCM structure is simple to comprehend, a mathematical analysis is complicated due to the highly nonlinear nature of quantization. To circumvent this difficulty, it is often assumed that the quantization noise is white and additive. Although it is insufficient in general, this assumption linearizes the whole system and makes it amenable to optimization.
BLP IN VIDEO CODING
A standard MPEG video data is structured into six hierarchical layers as in Figure 3.1. The highest one is the sequence layer, which constitutes a complete video. The next one is the group of pictures (GOP) layer. In our setup, the GOP structure starts with an intra-frame (I-frame), that serves as an anchor for the subsequent prediction, followed by inter-frames (P-frames) (Figure 3.2). Then the following is the picture layer which in itself is a single frame consisting of multiple slices, forming the slice layer.
For a channel with a packet-loss rate of 30% (i.e., q = 0.7), a unit-variance first-order Gauss-Markov signal of length 500000 with a correlation coefficient ρ = 0.99 is fed into the proposed scheme which utilizes a high-resolution quantizer of σ 2 Q = 0.0013 quantization noise variance (Figure 4.1 and 4.2).
Optimal parameters are estimated to be [a = 0.0270; b = 0.9400; c = 0.0855; d = 0.9900], setting the theoretical distortion evaluated as MSE, according to (2.22), to 0.0097 and, consequently, the coding rate to 6 (rounded from 5.9621) via the entropy-constrained high-resolution quantization approximation.
We proposed a novel LP-DPCM scheme called Bimodal Leaky Prediction for error-resilient source-channel coding and adapted it into the motion-compensated video coding framework. Our results demonstrate that BLP significantly outper-forms the traditional leaky prediction scheme as well as the conventional prediction scheme. The strength of our technique mainly stems from the fact that we have two reconstruction modes corresponding to loss and no loss, and all four parameters involved are simultaneously optimized.
The optimal leak factor a found by our BLP optimization tends to be relatively smaller than the correlation coefficient ρ and what standard video coding techniques use (a = 1) as well, thereby leaving most of the inter-signal correlation (or, in the case of video coding, inter-frame correlation) intact. The preserved correlation, then, becomes highly beneficial whenever a packet is dropped by the channel.
Tests with one-dimensional signals help to show how accurately the theoretical foundation behind BLP is established. Due to the high-resolution quantization modeling and the related approximations, the BLP output can follow its corresponding input signal of first-order Gauss-Markov process type almost perfectly at high source coding rates and, furthermore, the expected distortion can predict the realized distortion with very high accuracy.
Source: University of California
Author: Ufuk Celikcan