Introduction: Data compression technologies, such as audio, image and video coding, are important and successful applications of timefrequency analysis theory. Traditionally, in applications of timefrequency analysis, the whole time domain of a signal is divided into intervals at first. Within one interval, one computes the local frequency spectrum of the signal and then makes analysis or processing with respect to this local frequency spectrum. After the work in one time interval is done, it goes to the next interval. The whole analysis progress continues while time interval changes. By this approach, the signal processing task is conducted with respect to time intervals. Therefore, it may be called as "timeoriented" approach. This approach has been followed in both scientific research and signal processing engineering for more than one century.
However, this traditional, timeoriented appraoch to timefrequency analysis neglects one important fact that, in many applications, there exist statistical dependency between the local frequency spectra within adjacent time intervals. I proposed a new approach to timefrequency analysis, which offers a great platform for exploiting the dependency between local spectra. In the new approach, after computing the local frequency spectra in all time intervals, the transform coefficients at the same frequency bands are rearranged into one data group. One then makes signal analysis or processing with respect to these data groups. The whole analysis progress goes on while the frequency changes. The coefficients within one group embody the distribution of signal content at a certain frequency band. Therefore, they may indicate the statistical dependency between neighboring local frequency spectra. By this approach, the signal analysis work is done with respect to frequency bands. Therefore, it may be called as "frequencyoriented" approach. This new approach can remedy the drawback of traditional, timeoriented approach and hence may help to improve technical performance in practice. In accordance, I designed a new entropy coding mechanism for high quality video coding based on this new approach to timefrequency analysis. As illustrated in following figure, in timeoriented approach, one looks at the signal content from the perspective of time axis. In contrast, in frequencyoriented approach, one looks at the signal content from the perspective of frequency axis.
Video coding: Video coding technologies such as the excellent and popular international standards MPEG1, MPEG2, MPEG4, H.264 and HEVC are all typical applications of timefrequency analysis. Specifically, these technologies are fulfilled in the traditional, timeoriented approach to timefrequency analysis. Generally speaking, in existing video coding technologies, the prediction residual data of a macroblock are divided into 8*8 (MPEG) or 4*4 (H.264) blocks or transform units in variable size (HEVC) at first. Within each block, the data are transformed by discrete cosine transform or its integer approximate. Then, the transform coefficients (i.e., local frequency spectrum) are quantized in order to eliminate the statistical redundancy and visual redundancy. Afterward, coefficients are scanned in a linear, zigzag pattern. This scan pattern exploits the fact that most energy concentrate on low frequency. The scanning results, including values and locations of nonzero coefficients, are recorded using an entropy coding engine. Block by block, image data are transformed, quantized and entropy encoded, until the works are done to all macroblocks and frames. To the contrary, in decoding process, image data are entropy decoded, dequantized (or say rescaled) and inverse transformed block by block. The coding and decoding works are conducted with respect to spatial (time) blocks. Therefore, it fulfills the traditional, timeoriented appraoch to timefrequency analysis.
HEVC entropy coding:
To date, Huffman coding and arithmetic coding are two major entropy coding methods. In engineering practice, variablelength codes such as ExpGolomb code and unary codes are both exemplary utilizations of Huffman coding. These codes are established on the assumption that data elements in small magnitude should appear in higher probability than larger ones. Binary arithmetic coder is the most popular arithmetic coder in applications. H.264 and HEVC offer two alternatives on entropy coding mechanism: CAVLC (contextadapted variable length codes) and CABAC (contextadapted binary arithmetic coding). CABAC reportedly may gain 515% in coding efficiency than CAVLC.
In essence, the CABAC mechanism combines Huffman coding and arithmetic coding together. Its key elements include binarization, context model and arithmetic coding. The binarization process in CABAC factually fulfills Huffman coding. Syntax elements, including magnitudes (absolute values) of quantized transform coefficients, are “translated” into variablelength codes (VLC): unary plus ExpGolomb (UEG) codes. Then, the VLC representations are further encoded bitwisely by binary arithmetic coding engine according to a very delicate context model. In fact, the coding efficiency of CABAC may be significantly improved.
On the one hand, for a large amount of transform coefficients, the length of UEG codes for their magnitudes may far surpass the length of the magnitudes themselves. For example, to a coefficient of which ‘coeff_abs_levelminus1’ = 29, the H.264 and HEVC binarization engine first shall write 14 bits of ‘1’. Then, it shall write another 4 bits as suffix of ExpGolomb code for 15. In addition, it need to write two more bits for significance map, ‘significant_coeff_flag’ and ‘last_significant_coeff_flag’. All together, the binarization engine shall generate a code of 20 bits for binary arithmetic coding engine to work on. But the number 29 (or say 30) itself only takes 5 bits! In high quality video coding, large coefficients (15 and larger) emerge at a high percentage in the total number of coefficients. Therefore, the coding efficiency of CABAC may be severely degraded. Meanwhile, the coding engine may become very "lazy". Both encoder and decoder become unnecessarily very slow.
On the other hand, H.264 and HEVC takes the timeoriented approach to timefrequency analysis. The scan pattern and context models for both CAVLC and CABAC only exploit the statistical dependency within local frequency spectrum, i.e., the coefficients inside a transform block. It neglects the dependency between local frequency spectra at neighboring blocks. In fact, there exists statistical dependency between local frequency spectra within one macroblock, especially in intra_16 or inter_16*16 coding modes. We may exploit this dependency and improve coding efficiency.
New mothod: For high quality video coding, we designed new entropy coding method, which exploits both statistical dependency within and between local frequency spectra in one macroblock. It may significantly improve coding efficiency compared to CABAC. Meanwhile, it may greatly reduce the algorithmic complexity and run much faster than CABAC. More importantly, it may provide a very convenient platform for bit plane coding, which is naturally suitable for quality scalable video coding (SVC) with the same high coding efficiency but no additional complexity than single layer video coding.
The new method is based on bandwise rearrangement of DCT coefficients. The effect of bandwise rearrangement may be illustrated as follows. (You may use the images posted here in your own research or educational works, but please cite this website as source.)
In comparison, we may have the distribution of 5layer wavelet coefficients and 8*8 DCT coefficients respectively as follows:
Within each band, the coefficients hold statistical dependency, which embodies the dependency between adjacent local frequency spectra within a macroblock. This dependency is determined by the statistical dependency between image signals within neighboring transform blocks. It is further strengthened by the fact that signal energy concentrates more at low frequency bands than high frequency bands. This interspectrum dependency may be well exploited by a 2dimensional scan pattern.
Click here to download the document which specifies the new method in more details, including the mathematical rationale, application methodology and prospects.
