Journal of Multimedia Information System

Korea Multimedia Society

J Multimed Inf Syst 4(4):179-188

eISSN: 2383-7632

DOI: https://doi.org/10.9717/JMIS.2017.4.4.179

Section A

Complexity Analysis of Internet Video Coding (IVC) Decoding

Sang-hyo Park¹, Tianyu Dong², Euee S. Jang²^,^*

¹Communications & Media R&D Division, Korea Electronics Technology Institute, Gyeonggi-do, South Korea, E-mail: sanghyo.park@keti.re.kr

²Department of Computer Science, Hanyang University, Seoul, South Korea, E-mail: dongtianyu@hanyang.ac.kr; esjang@hanyang.ac.kr

^*Corresponding Author: Euee S. Jang, Department of Computer Science, Hanyang University, Seoul, South Korea, Tel: 82-2-2220-1086, E-mail: esjang@hanyang.ac.kr

© Copyright 2017 Korea Multimedia Society. This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: Nov 24, 2017 ; Revised: Dec 06, 2017 ; Accepted: Dec 09, 2017

Published Online: Dec 31, 2017

Abstract

The Internet Video Coding (IVC) standard is due to be published by Moving Picture Experts Group (MPEG) for various Internet applications such as internet broadcast streaming. IVC aims at three things fundamentally: 1) forming IVC patents under a free of charge license, 2) reaching comparable compression performance to AVC/H.264 constrained Baseline Profile (cBP), and 3) maintaining computational complexity for feasible implementation of real-time encoding and decoding. MPEG experts have worked diligently on the intellectual property rights issues for IVC, and they reported that IVC already achieved the second goal (compression performance) and even showed comparable performance to even AVC/H.264 High Profile (HP). For the complexity issue, however, there has not been thorough analysis on IVC decoder. In this paper, we analyze the IVC decoder in view of the time complexity by evaluating running time. Through the experimental results, IVC is 3.6 times and 3.1 times more complex than AVC/H.264 cBP under constrained set (CS) 1 and CS2, respectively. Compared to AVC/H.264 HP, IVC is 2.8 times and 2.9 times slower in decoding time under CS1 and CS2, respectively. The most critical tool to be improved for lightweight IVC decoder is motion compensation process containing a resolution-adaptive interpolation filtering process.

Keywords: Decoder complexity; MPEG IVC; AVC/H.264; Complexity analysis; Type-1 codec

I. INTRODUCTION

As the demand for video quality has increased for many years now, new video codec standards have also been developed with improved compression performance. Most famous standards such as MPEG-2, MPEG-4 AVC/H.264 and HEVC were standardized by either ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) under royalty-bearing intellectual property rights policy. Recently, the need for royalty-free video codec has been coming up with interesting situations: most patents of core technologies adopted in widely-used standards (e.g., MPEG-2) have either expired or will be expiring soon; many codec-related companies would like to support royalty-free codec; and studies of royalty-free codecs have recently been receiving attention in the literature [1-5].

Recognizing the diversified needs of the Internet, MPEG issued the Call for Proposals (CfP) for Internet Video Coding (IVC) technologies [6]. The IVC standard should achieve three goals: 1) the baseline profile will be granted in a free of charge license (i.e., Type-1 license) by patent owners according to the ISO/IEC Common Patent Policy [7], 2) the baseline profile will achieve better compression performance than MPEG-2 and be comparable to AVC Baseline Profile, and 3) the complexity will be feasible for real-time encoding/decoding on generally available personal computers and mobile devices [6]. Responding to the CfP, several influential industry leaders and universities proposed three codecs [8]: Web Video Coding (WVC), Video Coding for Browsers (VCB) and IVC.

By far, MPEG experts investigated and verified that the coding efficiency of IVC is better than that of AVC Constrained Baseline Profile and is even comparable to AVC High Profile in terms of subjective quality [8], showing additional results that IVC is mostly better than WVC and VCB. With these diligent efforts, the preliminary of the final draft of international standard (FDIS) version of IVC was published in January 2017. There exists, however, one important issue that needs resolving: the decoding complexity problem for various real-time internet applications (e.g., video chat and internet streaming).

In this paper, we briefly review IVC technologies focusing on their differences from conventional video codecs and analyze the decoder modules in terms of computational complexity. We measure time complexity (i.e., running time) to precisely investigate the complexity of IVC coding tools just as other conventional video codecs have been investigated in the literature ([9-11]). In addition, we evaluate how much an IVC-specific tool affects decoding time by turning on/off those tools. Through the experimental results, we present how complex an IVC decoder is and which module is critical in IVC decoding. In addition, we compare the decoding complexity of IVC to that of AVC/H.264 Baseline and High Profiles, which are commonly used in many streaming applications. With the results from the analysis of IVC decoding, it would be helpful for the reader to derive the time complexity estimation for a variety of processors and to optimize the decoding speed of IVC.

The remaining sections are organized as follows. Section 2 briefly describes the different features of IVC decoding from other existing video codecs. Section 3 identifies the time complexity of IVC decoder with experimental analysis. Section 4 shows experimental results of comparison between IVC and a widely-used codec AVC/H.264 [12] in terms of time complexity. Finally, Section 5 concludes this paper.

II. DISTINCTIVE FEATURES OF INTERNET VIDEO CODING (IVC) DECODER

IVC is a codec with a similar coding structure to MPEG-2 standard [13] but enhanced with several effective techniques. Some contributors declared their own patented techniques to be Type-1 for IVC codec. Other contributors mined prior art techniques that have expired or revealed in the literature without patents. In other cases, contributors mined prior art techniques, which means that the techniques have expired or revealed in the literature without patents. In big picture, as in conventional video codecs, an IVC bitstream should be decoded through an inverse transform/quantization process (if needed according to the syntax of each macroblock), motion compensation and entropy coding. Except for group of pictures (GOP) layer, IVC has the same hierarchical layer as MPEG-2, which consists of sequences, pictures, slices and macroblocks in layers. However, since many new and old (i.e., prior art) techniques have been adopted in IVC to gain compression performance further, those different aspects are given in the remaining subsections with emphasis on the basic principle. The detailed information of how to parse bitstream, how to interpret the coded symbols and how to reconstruct video will be fully specified by the FDIS of IVC [14], as usual with popular video coding standards.

2.1. Frame Type and MB Type

There are three picture types in IVC: intra-coded frame (I-frame), predictive-coded frame (P-frame) and bidirectional predictive-coded frame (B-frame). Since IVC adopted a prior art technique that uses multiple reference frames, blocks of P frame can refer not only to the most recent P-frame, but also to earlier P-frames or I-frame.

IVC has been developed with two coding configuration targets: random access and low-delay scenarios. For the random access scenario, IVC takes IBBP coding structure. As described in Fig. 1, B-frame can only refer to the nearest I- or P-frame, while P-frame can refer to multiple previous P-frames or I-frames—if stored in the reference frame buffer.

Fig. 1. An example of IBBP coding structure in IVC. Arrows represent where each frame can refer for inter-prediction.

Download Original Figure

For low-delay applications, IVC bitstream can be encoded with IPPP coding structure. In this case, a P-frame must refer to one of the previous frames of which distances from the current frame were pre-defined. IVC has an additional P-frame type, called non-reference P-frame, as a sub-type that will not be delivered or used to reference frame buffers for coding efficiency.

Macroblock (MB) is partitioned by a quadtree-based approach within 16 x 16 pixels as shown in Fig. 2. Among those five partitions, the inter-predicted block can be coded by four partitions {16 x 16, 8 x 16, 16 x 8 and 8 x 8}; on the other hand, intra-predicted block can be coded by three squared partitions {16 x 16, 8 x 8 and 4 x 4}. According to the MB partition type, each block can be predicted by various modes and transformed/quantized with different kernel sizes separately, which is described in the following subsections.

Fig. 2. MB partition types and the block size of each type in IVC.

Download Original Figure

2.2. Inter-Prediction (Prediction Modes and Motion Accuracy)

A block in a partitioned MB can be encoded by several inter-prediction modes depending on the frame type. In a P-frame, three prediction modes are possible for inter-prediction: forward prediction, skip and multiple-hypotheses prediction modes. Multiple-hypotheses prediction mode is an intriguing mode that makes an imaginary block by combining two reference blocks in previous frames (the detailed process was presented in [15]). Thus, this last mode needs additional motion compensation process, which can overweigh decoder complexity.

In B-frame, the basic concept of forward prediction and skip modes are shared with P-frame. In addition, backward and bidirectional prediction (also called symmetrical) modes are allowed in B-frame. Backward mode predicts the current block through future reference frames. On the other hand, Bidirectional mode refers to both past and future reference frames and make two blocks, one which is suitable enough to predict the current block.

IVC also increased motion accuracy by adopting interpolation filtering technique that enables it to generate half-/quarter-pels. The interpolation filter in IVC is distinguished from other recent video codecs due to a variable filter tap size depending on video resolution. It varies within 4-, 6- and 10-tap size for luma component, but for chroma, a 4-tap size is used. Undoubtedly, the larger filter tap size is, the heavier the burden on decoder complexity becomes.

2.3. Intra-Prediction

The concept of intra-prediction—predicting pixels through information in the same frame—was already present in MPEG-2 [13]; however, in IVC, the intra-predicted block can exist in P- or B-frames as well, of which this has been widely used in recent video codecs. Instead of having only DC mode as in MPEG-2, intra-prediction in IVC has several additional modes depending on MB partition and on color component. For luma component, there are one DC and four directional modes (i.e., horizontal, vertical, down left (↙) and down right (↘)) based on the availability of upside and/or left side neighbor samples. These five modes are supported in 16 x 16, 8 x 8 and 4 x 4 MB partitions.

On the other hand, there are totally four modes (i.e., DC, horizontal, vertical and plane) for chroma components supported in an 8 x 8 MB partition only. Among the four modes, DC/horizontal/vertical modes for chroma intra-prediction are operated in the same way as those for luma, but the last mode is different. The plane mode takes neighbor samples of both directions (upside and left side samples) and does summation, shift, and clipping operations with them. The plane mode might place a burden on the decoder complexity due to those operations since other directional modes could directly assign neighbor pixels to the target pixels without those additional operations.

2.4. Transform and Quantization

Integer discrete cosine transform technique is used with quadtree-based variable kernel size [16] and the supported sizes for IVC are 16 x 16, 8 x 8 and 4 x 4. Unless the prediction mode of a block is encoded with skip mode, an inverse transform should be performed on the premise that this block has been quantized. In the current ITM, a butterfly structure is used, supporting a 1-D 8-point forward transform and proper approximation is performed to generate rational numbers for irrational numbers in this structure.

The order of transform and quantization process at decoder is as follows. The input values should be scanned in a zigzag order and the scanned values should be transformed inversely. Afterwards, the inverse-transformed values are to be dequantized according to a given quantization parameter (QP) value. Dequantization table and associated shift table are described in the FDIS of IVC [14].

2.5. Arithmetic Entropy Coding

For the entropy coding, IVC uses logarithmic domain arithmetic coding which takes the following steps: 1) initialization process of context model, 2) binarization process if the syntax element is non-binary, and 3) binary arithmetic decoding for bin string (including context model selection if necessary). The arithmetic entropy coder in IVC is logarithmic binary arithmetic coder (LBAC) which avoids multiplication operations and look-up tables. By using LBAC, the decoder can avoid redundant memory costs and path delays caused in context adaptive binary arithmetic coding [17].

2.6. Loop Filter

Within the decoding loop of IVC, a filter that conditionally filters boundaries between blocks can be applied except to image boundaries and slice boundaries (the basic concept can be found in an expired patent [18]). This loop filter, called deblocking filter, come in three types—weak, normal and strong loop filtering—according to conditions that judge how much compensation is needed for subjective visual quality. In brevity, weak loop filtering filters only two pixels per one horizontal or vertical boundary line, normal loop filtering filters four pixels and strong filtering filters six pixels. Surely, the stronger the filtering, the more decoding time is needed. The detailed information on how to filter pixels is described in [8] and the associated parameters such as threshold values are presented in [19].

III. COMPLEXITY ANALYSIS OF INTERNET VIDEO CODING DECODER

In this section, the complexity of IVC decoder is analyzed using a profiling tool. To analyze the complexity, IVC bitstream files are generated by IVC test model (ITM) 14.0 and then decoded by IVC decoder. To give the associated information in detail, the test material (i.e., video sequences) and test environment to decode bitstream are presented in the following subsection. In addition, specific coding conditions are described, including the parameters used in the encoding process to generate IVC bitstream files. To analyze the complexity of the IVC decoder, a well-known profiling tool—Intel VTune performance analyzer [20]—is used in this paper. Finally, the results of the analysis are described according to the classification of major coding tools so that we could notice which tool is critical in terms of the complexity.

3.1. Analysis Setup

We chose four test sequences from the recommended video sequences specified in the IVC exploration experiment document [21]. The detailed information on each video sequence is shown in Table 1, including the number of frames to be encoded. All the sequences were tested under both constraint set 1 (CS1) and constraint set 2 (CS2) conditions. CS1 and CS2, respectively, are similar to random access and low delay coding structures, the commonly used configurations in recent video codecs. To evaluate the time complexity, the following development environment was employed: quad-core CPUs running at 2.40 GHz, 8 GB random-access memory (RAM) and a 64-bit Windows operating system (OS). Decoding each bitstream file was carried in a single thread and no parallelization techniques were used during decoding.

Table 1. Information on test sequences.

Sequence name	Resolution	Total frame number	FPS
Kimono	1920x1080	240	24
ParkScene	1920x1080	240	24
BasketballDrill	832x480	500	50
PartyScene	832x480	500	50

Download Excel Table

Describing related encoding conditions specifically is important as the characteristics of bitstream files including decoding complexity can vary depending on the encoding conditions such as quantization parameter. Table 2 shows the general encoding parameters for CS1 and Table 3 shows sequence-specific encoding parameters for CS1. Similarly, Table 4 shows the general encoding parameters for CS2 and Table 5 shows sequence-specific encoding parameters for CS2. In general, the ITM encoder description [19] describes some of the encoding conditions and parameters, but there are few different parameters, such as QP, in this paper. Those different parameters are set to fit the given target range of bitstream size, which was agreed by MPEG experts to conduct visual assessment of Type-1 codecs [22].

Table 2. General encoding conditions and parameters for CS1

Coding parameter	Used value	Description
• QP Remaining Frame	QPI + 2	• QP for P-frames (0-63)
• QPB Picture	QPI + 5	• QP for B-frames (0-63)
• FME	1	• Fast Motion Estimation
• Number Reference Frames	5	• Number of previous frames used for inter motion search
• P SubType	0	• Non-reference P-frame coding
• RDO_Q	1	• Rate distortion (RD) optimization on quantization
• Multiple HP	1	• Low cost multiple-hypothesis motion compensation
• ABT Enable	1	• 16x16 transform and intra-prediction
• IF TYPE	1	• Adaptive tap
• Loop Filter Disable	0	• Disable loop filter in frame header

Download Excel Table

Table 3. Sequence-specific parameters for CS1.

Sequence	Intra Period	QP First Frame	Number B Frames
Kimono	8 (24)*	24	2
ParkScene	6 (24)*	27	3
BasketballDrill	13 (52)*	32	3
PartyScene	13 (52)*	35	3

(the frame number of 2^nd I-frame)

Download Excel Table

Table 4. General encoding conditions and parameters for CS2.

Coding parameter	Used value	Description
• QP Remaining Frame	QPI + 2	• QP for P-frames (0-63)
• QPB Picture	QPI + 5	• QP for B-frames (0-63)
• FME	1	• Fast Motion Estimation
• Number Reference Frames	5	• Number of previous frames used for inter motion search
• P SubType	1	• Non-reference P-frame coding
• P SubType Non Adaptive	0	• Non-adaptive non-reference P-frame coding
• P Sub QP Delta0	7	• QP for 3rd layer P-frames added from QP for P-frames
• P Sub QP Delta1	3	• QP for 2nd layer P-frames added from QP for P-frames
• RDO_Q	1	• RD optimization on quantization
• Multiple HP	1	• Low cost multiple-hypothesis motion compensation
• ABT Enable	1	• 16x16 transform and intra-prediction
• IF TYPE	1	• Adaptive tap
• Loop Filter Disable	0	• Disable loop filter in frame header

Download Excel Table

Table 5. Sequence-specific parameters for CS2

Sequence	QP First Frame (QP for I-frame)
Kimono	23
ParkScene	23
BasketballDrill	29
PartyScene	33

Download Excel Table

3.2 Analysis Results and Observation

We measured the time consumed by each function using the performance analyzer. We classified those functions used in the decoding into six categories: motion compensation (MC), entropy decoding (ED), intra-prediction (IP), loop filtering (LF), inverse transform/quantization (T/Q) and so on. This classification is a common theme in research on the decoding complexity analysis of recent video codecs including the analysis of HEVC [10] and of AVC/H.264 [9]. Under the CS1 condition, Fig. 3 shows the performance ratio of the six categories of functions in accordance with video resolutions—1920 x 1080 and 832 x 480. The most time-consuming category is MC. This trend has also been seen in other recent video codecs [9-10] because of the highly complex interpolation filtering. The reason that MC consumes most of the decoding time can be explained as follows. Firstly, all the motion vectors in B-frame are derived by multiplying the distances of frames. Thus, motion vectors can indicate half-pel or quarter-pel not only depending on the motion vector difference (MVD) value, but also depending on the distance. Secondly, multiple-hypotheses prediction modes in P-frame must use interpolation filtering as this mode takes the average value of two motion vectors. Finally, due to the adaptive filter tap size according to the video resolution, the percentage of MC can be increased in low video resolution. If the height of frame is less than 720, the filter size for interpolation filtering will be 10-tap, which is larger than the filter tap size of HEVC. Note that IVC uses the same filter tap size for half-pel and quarter-pel interpolation processes.

Fig. 3. Decoding time ratio of six categories in CS1 condition: (a) is for 1920x1080 sequences and (b) is for 832x480 sequences.

Download Original Figure

Under the CS2 condition, Fig. 4 shows the performance ratio of the six categories of functions in accordance with video resolutions—1920 x 1080 and 832 x 480. Still, the most time-consuming category is MC under CS2. One of differences of results from CS1 is that the percentage of MC under CS2 further decreased. One possible explanation is that there is no more B-frame in CS2. The other noticeable difference from CS1 is that the percentage of LF is slightly increased. Since CS2 has a special P-frame type, called non-reference P frame, which is usually encoded by much higher QP value than other frames, we guess that those frames tend to need deblocking filtering to compensate for coding errors.

Fig. 4. Decoding time ratio of six categories in CS2 condition: (a) is for 1920x1080 sequences and (b) is for 832x480 sequences.

Download Original Figure

As shown in Fig. 3 and Fig. 4, MC was the most time-consuming category. Thus, we believe that to reduce IVC decoding complexity, interpolation filtering should be carefully considered as a main target. In addition, inverse transform/quantization and loop filtering should be targeted as well. Possible solutions can be a decoder-side optimization—a software-based coefficient-aware fast algorithm [23]—or a hardware-based acceleration. In a different approach, the other solution can be an encoder-side filtering restriction. For that purpose, an encoder may choose not to use deblocking filtering and/or interpolation filtering though bitrate, which may compromise frame quality. For example, a similar approach exists in the restriction method of adaptive loop filter (ALF) that was tried in HEVC [24].

IV. COMPARISON RESULTS OF TIME COMPLEXITY

To compare the time complexity of IVC decoding with other codecs, we selected AVC/H.264 as an anchor, which has been widely used in many video applications such as video streaming. Specifically, two profiles of AVC/H.264 were chosen: High Profile (HP)—which shows the best coding efficiency among all the AVC profiles—and constrained Baseline Profile (cBP)—which is one of the goals of the IVC project. Since decoding complexity can vary depending on various encoding configurations, we generated bitstream of codecs according to encoding conditions agreed by MPEG experts [22]. Table 6 describes the information on test materials including frame per second (FPS). To satisfy the rate points as closely as possible, video codecs used in this paper may have a chance of increasing one additional QP after a certain frame number during encoding. By allowing the increase, all bitstream files satisfied the rate points in Table 6 within the range of -3% to +3%. To evaluate the decoding time, the following development environment was employed: quad-core CPUs running at 4.00 GHz, more than 16 GB random-access memory (RAM) and a 64-bit Windows operating system (OS).

Table 6. Test sequences and target rate points

Sequence (frame per second)	Rate 1 (R1)	Rate 2 (R2)	Rate 3 (R3)	Rate 4 (R4)
1920x1080p
S03: Kimono 24fps	1.6 Mbit/s	2.5 Mbit/s	4.0 Mbit/s	6.0 Mbit/s
S04: Park Scene 24fps	1.6 Mbit/s	2.5 Mbit/s	4.0 Mbit/s	6.0 Mbit/s
S05: Cactus 50fps	3.0 Mbit/s	4.5 Mbit/s	7.0 Mbit/s	10.0 Mbit/s
S06: BasketballDrive 50fps	3.0 Mbit/s	4.5 Mbit/s	7.0 Mbit/s	10.0 Mbit/s
836x480p (WVGA)
S08: BasketballDrill 50fps	512 kbit/s	768 kbit/s	1.2 Mbit/s	2.0 Mbit/s
S09: BQMall 60fps	512 kbit/s	768 kbit/s	1.2 Mbit/s	2.0 Mbit/s
S10; PartyScene 50fps	512 kbit/s	768 kbit/s	1.2 Mbit/s	2.0 Mbit/s
S11: RaceHorses 30fps	512 kbit/s	768 kbit/s	1.2 Mbit/s	2.0 Mbit/s

Download Excel Table

The decoding time results of IVC and AVC/H.264 (cBP and HP) are shown in Table 7 and Table 8. Here, the sequence names are briefly noted as SXX (XX is two-digit numbers denoting each sequence) and the target rate points are briefly noted as RX (X is one-digit number denoting each rate point). The notation DT_m stands for the decoding time of m codec. On average, IVC showed slower decoding times than AVC cBP and AVC HP. Under CS1, IVC was 3.65 times slower than AVC cBP and 2.84 times slower than AVC HP, on average. Note that as IVC uses the smallest tap size for interpolation in high resolution, the percentage difference in decoding times of the IVC and AVC cBP could be up to 194 under CS1%. However, of the bitstream for 832 x 480 resolution, IVC had a much smaller decoding time than AVC codec, showing the time difference almost 400%. Under CS2, IVC showed similar results as under CS1. On average, IVC showed 3.13 times slower than AVC cBP and 2.9 times slower than AVC HP as shown in Table 8. Table 8 also shows that the difference of decoding time between IVC and others could be small in high resolution, whereas the difference could be large in low resolution. In conclusion, IVC showed a comparatively slow decoding complexity than the two profiles of AVC/H.264, which should be reduced significantly for real-time video decoding application. Especially, in the low-resolution case, the interpolation filtering process should be focused to substantially decrease the overall decoding complexity.

Table 7. Decoding time results of IVC, AVC cBP and AVC HP under CS1

Resolution	Bitstream Name	DT_IVC (s)	DT_IVC / DT_cBP	DT_IVC / DT_HP
1920 x 1080
	S03R1	28.288	221%	184%
	S03R2	28.9997	216%	180%
	S03R3	29.3073	206%	172%
	S03R4	30.1726	199%	167%
	S04R1	29.6153	244%	195%
	S04R2	30.2735	235%	188%
	S04R3	30.6121	225%	181%
	S04R4	31.5733	217%	178%
	S05R1	43.8724	180%	158%
	S05R2	45.3512	178%	159%
	S05R3	47.1867	178%	158%
	S05R4	50.1795	178%	159%
	S06R1	55.3512	209%	173%
	S06R2	57.1086	206%	172%
	S06R3	58.6198	201%	169%
	S06R4	60.5893	198%	168%
832 x 480
	S08R1	8.3055	455%	342%
	S0SR2	8.8931	438%	342%
	S0SR3	9.7029	420%	342%
	S0SR4	10.634	389%	322%
	S09R1	14.7076	593%	422%
	S09R2	15.2224	566%	403%
	S09R3	15.7313	536%	394%
	S09R4	16.3092	495%	374%
	S10R1	15.7121	792%	556%
	S10R2	16.0396	700%	511%
	S10R3	16.3169	620%	475%
	S10R4	16.6684	546%	433%
	S11R1	8.3648	534%	395%
	S11R2	8.4657	480%	366%
	S11R3	8.5305	433%	336%
	S11R4	8.703	381%	301%
Average		26.731	365%	284%

Download Excel Table

Table 8. Decoding time results of IVC, AVC cBP and AVC HP under CS2

Resolution	Bitstream Name	DTIVC (s)	DTivc / DTcbp	DTivc / DThp
1920 x 1080
	S03R1	23.6373	182%	178%
	S03R2	27.8701	203%	197%
	S03R3	30.5472	209%	201%
	S03R4	32.248	206%	196%
	S04R1	23.3655	184%	179%
	S04R2	26.0193	194%	187%
	S04R3	28.6855	201%	190%
	S04R4	30.8116	200%	189%
	S05R1	38.665	157%	152%
	S05R2	41.239	161%	154%
	S05R3	44.4967	164%	157%
	S05R4	46.8943	163%	155%
	S06R1	42.2896	159%	152%
	S06R2	47.3484	170%	162%
	S06R3	52.5857	178%	170%
	S06R4	56.7286	183%	175%
832 x 480
	S08R1	7.5591	406%	386%
	S08R2	8.26	401%	374%
	S08R3	9.389	392%	371%
	S08R4	10.9266	380%	343%
	S09R1	10.8476	439%	404%
	S09R2	12.0024	445%	407%
	S09R3	13.4765	452%	408%
	S09R4	15.0105	440%	392%
	S10R1	10.0222	476%	434%
	S10R2	11.91	495%	449%
	S10R3	14.2475	515%	466%
	S10R4	15.9867	502%	442%
	S11R1	7.2072	445%	418%
	S11R2	8.1485	450%	415%
	S11R3	9.0931	443%	402%
	S11R4	9.9837	417%	372%
Average		23.984	313%	290%

Download Excel Table

V. CONCLUSION

In this paper, we briefly presented IVC coding techniques, focusing on computational time complexity. The relative importance of the coding tool in terms of decoding time was investigated using a profiling software and the experimental results showed that motion compensation and transform/quantization processes consume most of the decoding time. Particularly, one IVC-specific coding tool (i.e., resolution-adaptive interpolation filtering) has critical impact on low video resolution because of large filter tap size, which should be overcome to reduce the decoding complexity. In addition to the complexity analysis of IVC itself, we provided comparison results of the decoding time with those of AVC/H.264 cBP and HP—two widely used codecs. As demonstrated in experiments, the decoding complexity of IVC should be significantly reduced for real-time video decoding applications. Possible solutions on reducing the decoding complexity of IVC bitstream could be 1) parallelization techniques on motion compensation and transform/quantization processes, 2) decoding complexity-aware RD optimization during encoding and 3) hardware-based decoder acceleration.

REFERENCES

[1].

J. Chen, F. Xu, Y. He, J. Villasenor, Y. Han, Y. Xu, Y. Rong, C. Reader and J. Wen, “Efficient Video Coding Using Legacy Algorithmic Approaches,” IEEE Trans. Multimedia, vol. 14, no. 1, pp. 111-120, Feb. 2012.

[2].

K. Choi and E. Jang, “Royalty-free video coding standards in MPEG,” IEEE Signal Process. Mag., vol. 31, no. 1, pp.145-148,155, Jan. 2014.

[3].

J. Bankoski, P. Wilkins and Y. Xu, “Technical overview of VP8, an open source video codec for the web,” in Proc. IEEE ICME, 2011, pp. 1-6

[4].

J. Bankoski, R. S. Bultje, A. Grange, Q. Gu, J. Han, J. Koleszar, D. Mukherjee, P. Wilkins and Y. Xu, “Towards a next generation open-source video codec,” Proc. SPIE 8666, Feb. 2013.

[5].

I. K. Kim, S. Lee, Y. Piao and J. Chen, “Coding efficiency comparison of new video coding standards: HEVC vs VP9 vs AVS2 video,” in Proc. IEEE ICMEW, Jul. 2014.

[6].

Call for Proposals (CfP) for Internet Video Coding Technologies, ISO/IEC JTC1/SC29/WG11, document N12204, Jul., 2011.

[7].

Guidelines for Implementation of the Common Patent Policy for ITU-T/ITU-R/ISO/IEC, ISO/IEC/ITU, 2nd rev., 2015 [Online]. Available: http://www.iso.org/iso/standards_development/patents

[8].

R. Wang, T. Huang, S. Park, J. –G. Kim, E. S. Jang, C. Reader and W. Gao, “The MPEG Internet Video Coding Standard,” IEEE Signal Process. Mag., vol. 33, no. 5, Sep. 2016.

[9].

M. Horowitz, A. Joch and F. Kossentini, “H.264/AVC baseline profile decoder complexity analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 704-716, Jul. 2003.

[10].

F. Bossen, B. Bross, K. S¨uhring and D. Flynn, “HEVC complexity and implementation analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1685-1696, Dec. 2012

[11].

J. Vanne, M. Viitanen, T. D. Hämäläinen and A. Hallapuro, “Comparative rate-distortion-complexity analysis of HEVC and AVC video codecs,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1885-1898, Dec. 2012.

[12].

T. Wiegand, G. J. Sullivan, G. Bjøntegaard and A. Juthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560-576, Jul. 2003.

[13].

Generic Coding of Moving Pictures and Associated Audio Information - Part 2: Video, ISO/IEC 13818-2 (MPEG-2)—ITU-T Recommendation H.262, 1994.

[14].

Preliminary Text of ISO/IEC FDIS 14496-33 Internet Video Coding, ISO/IEC JTC1/SC29/WG11 N16679, Jan. 2017.

[15].

L. Chen, S. Dong, R. Wang, Z. Wang, S. Ma, W. Wang and W. Gao, “Low-cost Multi-hypothesis Motion Compensation for Video Coding,” Proc. SPIE 9029, 2014.

[16].

C.-T. Chen, “Adaptive transform coding via quadtree-based variable block size DCT,” in Proc. IEEE ICASSP’87, vol. 3, Glasgow, Scotland, U.K., May 1989.

[17].

Q. Yu, W. Yu, P. Yang, J. Zheng, X. Zheng and Y. He, “An Efficient Adaptive Binary Arithmetic Coder Based on Logarithmic Domain,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 4225-4239, Nov. 2015.

[18].

Honjo, M., “Method of correcting an image signal decoded in block units,” U.S. Patent 5337088 A, Aug. 9, 1994.

[19].

S. Park, R. Wang and J. –G. Kim, “Internet Video Coding Test Model (ITM) v 14.1,” ISO/IEC JTC1/SC29/WG11, document N16035, 2016.

[20].

[online] Intel VTune™ Amplifier XE 2011 Release Notes for Windows OS. Available: https://software.intel.com/sites/default/files/m/d/4/1/d/8/release_notes_amplifier_xe_windows.pdf

[21].

R. Wang, J. –G. Kim and S. Park, “Description of IVC Exploration Experiments,” ISO/IEC JTC1/SC29/WG11, document N15761, Oct., 2015.

[22].

“Conditions for visual comparison of VCB, IVC and WVC codecs”, ISO/IEC JTC1/SC29/WG11 MPEG, N13943, Nov. 2013

[23].

S. Park, K. Choi and E. S. Jang, “Zero coefficient-aware fast butterfly-based inverse discrete cosine transform algorithm,” IET Image Processing, vol. 10, no. 2, pp. 89-100, Jul. 2016.

[24].

S. Park, K. Choi, G. Noh and E. S. Jang, “Frame-based Adaptive Selection of ALF for Fast HEVC Decoding,” Proc. IEEE BMSB, pp.1-4, 2012.