HEVC is the most recent video coding standard which was established in January 2013. It was developed by Joint Collaborative Team on Video Coding (JCT-VC) formed by the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Pictures Experts Group (MPEG). Many proposals to improve the coding efficiency were submitted and various coding tools based on proposal were adopted. HEVC RExt called second version of HEVC was published in July, 2014. The main purpose of HEVC RExt standardization was making the video coding standard to dealing the high quality video data. By this reason, it can support the various color sampling formats and bit-depths compare to HEVC version 1. HEVC RExt is suitable for dealing the high quality and massive data such as medical images or studio images.
In this paper, efficient studio video data editing method using HEVC RExt is proposed. To adopt the HEVC RExt in the studio video editing, the following points should be considered .
Low-delay coding and decoding (1 frame or less)
Low-loss compression providing visually near-perfect reproduction.
Multi-generation compression adds negligible concatenation errors and additional loss.
A symmetrical encode/decode algorithm that should be relatively easy to implement both in hardware and software
Support for full range of images (image sampling, resolution, frame-rate, bit depth and color gamut)
Compression range is from 2 to 20 times.
Similarly, such a codec can also be used to decrease file sizes to improve storage efficiency and download times during production.
The clause 2) to 7) are the basic function of the video codec to dealing high quality data. HEVC RExt was designed for satisfying these requirements in the standardization stage. The clause 1) is the main feature which can reveal the characteristics of the studio codec. If the coding delay is increased in the studio editing, the efficiency of the editing process might be decreased. Most of video codec support low-delay coding structure and one of the typical structure is the “intra only” structure. In case of “intra only” coding structure, it supports frame level editing without any decoding process. Because referencing frames are not necessary in “intra only” coding structure. As is well known, some tradeoffs are existed between coding efficiency and delay. When the number of reference frame for current frame is increasing, the coding efficiency is also increasing in most cases. In that case, the delay for editing current frame is also increasing due to reference frames decoding. It is shown in Fig. 1.
Fig. 1 is the example of the general low-delay (LD) coding structure in HEVC common test condition. Although the reference frames for picture #6 are two, the all pictures before picture #6 should be decoded for editing. It is against the above clause 1) and it also brings quite big inconvenience during studio editing work.
One of the main motivation of this paper is making the new coding structure which minimizing the unnecessary referencing structure to reduce the editing delay and re-encoding. Even though the reference structure is simplified, the proposed method would maintain the coding gain of the reference structure. In this paper, we propose new low-delay coding structure which includes referencing structure without any violation of mentioned clauses. The proposed coding structure satisfy the clause 1) and it also shows better coding performance compare to “intra only” coding structure.
The detail descriptions of the proposed coding structure are described in section 2-A.
Many coding tools were adopted during the standardization. The computational complexity was also increased. Other main motivation of the proposed method is reducing the computational complexity of decoding and re-encoding process during the editing. To reduce the computational complexity, many efficient reports were published [4-8]. Kim et al, propose fast intra prediction method based on SATD (sum of absolute transformed difference) cost. They determine the available prediction modes by the SATD cost of some pre-defined prediction mode. Morta et al, also propose the intra mode decision algorithm. The proposed method decides the candidate prediction modes using the direction information of neighboring blocks. The inter prediction complexity reduction method is proposed by Kim et al. They determine the weather conduct the bi-prediction or not by the SAD (sum of absolute difference) cost of the largest block. The SIMD based fast coding method is proposed by Jeon et al. The main coding functions such as transform, intra prediction, motion estimation/compensation are implemented by SIMD. Most of recent encoding optimization methods are based on fast algorithms. The fast coding approaches based on parallel coding methods using multi-core and multi-server are very few.
In this paper, we also consider many fast coding methods to reduce the computation complexity. First, we remove the coding tools which have heavy complexity and little coding gain. Second, various parallel processing methods are adopted such as frame/tile-level parallel processing and the parallel processing using SIMD. Last, the distributed coding is applied based on multi-severs.
The remainder of this paper is organized as follows: In section II, we explain the proposed ultra-low delay coding structure. The proposed fast coding approaches are described in section III. Finally, experimental results and some conclusions are given in sections IV and V.
II. PROPOSED VIDEO CODING STRUCTURE FOR EFFICIENT EDITING
In this section, we explain the new coding structure which is called “ultra-low delay (ULD)” in this paper. As we mentioned it in the previous section. The main problem of the studio video editing work is the editing delay caused by complex referencing structure. It also brings frequent decoding and re-encoding. In proposed structure, using the IDR picture as the only reference picture, the only one frame delay is need for editing in one GOP (group of pictures). We also bring the coding gain by maintaining the temporal referencing structure compare to “intra only” structure.
The basic purposed of the proposed coding structure is both satisfying the clause 1) and accommodating the advantage of referencing structure.
The proposed coding structure is basically following the LD coding structure of HEVC common test condition. The main differences compare to low-delay structure are described as follow.
While the LD structure is consisting of four hierarchies, the ULD structure supports just two hierarchies.
Only previous coded intra picture is used for reference picture of other following pictures within the same intra period.
The quantization parameter (QP) of LD structure is charged by corresponding hierarchy of layer. The base QP is the QP of intra picture (QPI). When hierarchy of layer is increased, the QP is also increased form QPI+1 to QPI+3. ULD structure support only two hierarchy. Then the QP of non-intra pictures is determined by QPI +1. The non-intra pictures of ULD are non-referencing picture and it refers just previous intra picture in same group of pictures (GOP). The advantage of ULD structure is from above mentioned referencing structure. Decoding just one or less picture is need for the accessing the picture that user want to edit. The coding efficiency of ULD is also better than “intra only” which is based on non-referencing structure.
Generally, the editing result is emitted after all editing procedures are finished. If the coding structure of the result is based on the referencing structure (such as LD), the re-encoding procedure is required. (Shown in Fig. 3)
In case of Fig. 3, the picture extraction is occurred between picture #2 and picture #8. In ULD case, three picture are decoded and only two pictures are re-encoded. In LD case, all pictures before picture #8 should be decoded and re-encoded. Reducing the number of pictures which need decoding and re-encoding procedure is the another main advantage of the ULD.
III. PROPOSED FAST CODING METHODS
In this section, the fast coding methods are proposed to reduce the computation complexity of re-encoding process during the editing. The main idea of proposed method is the parallel processing using the multi-core and distributed coding based on multi-server platform. By adopting the various fast coding techniques, 4K-UHD video can be coded in real time. It brings many conveniences in studio quality video editing.
As we mentioned above section, the low-delay coding structure is the essential for editing the studio video. The random access is also essential requirement. So the IDR frame is inserted every 0.1 second in this paper. The complexity analysis of HEVC RExt is conducted under JCT-VC low delay common test condition. The complexity of main tools is described in Table 1 & 2
The complexity of inter prediction which includes motion estimation/compensation and interpolation is the biggest. The second is quantization procedure. Main complexity of quantization is caused by rate-distortion optimized quantization.
In this paper, the tool-level optimization is adopted. Tool-level optimization is determination procedure that which tools are used or not by the tradeoff between complexity and coding gain. We calculated the complexity and coding gain of each tools. Then on or off decision of each tools is determined and it is described in Table 3.
In this paper, the parallel encoding method is proposed to accelerate re-encoding process during the editing work. The proposed parallel processing is consisted of two levels. First is the tile-level parallel processing. The picture is divided into several blocks which is called “tiles” then each tiles are coded at the same time. The other one is the picture level parallel processing. The non-intra pictures which refer same intra picture can be encoded in parallel. Because in ULD, there is no referencing relation between non-intra pictures. The details are shown in Fig. 4. After intra picture is coded, the following non-intra pictures which refer previous coded intra picture are encoded in parallel. Non-intra pictures in same group of picture are coded at the same time. Also, the tiles in the picture are also coded in parallel. The 16 tiles per picture are coded in parallel. These parallel coding concepts also can be adopted in decoding procedure as well as re-encoding procedure.
The overall performance of the proposed coding structure and parallel processing is described in Table 4 & 5. The test is conducted on HEVC test model (HM) version 15.0+RExt-8.1, which is set as the anchor for comparison test. To evaluate the coding performance of the proposed coding structure is compared with “intra only” configuration which is one of the common test condition in JCT-VC.
The bit depth of test sequences are 10bit and test sequences are categorized into 2 category by color sampling format. In case of 4:2:2, the average BD-rate and time saving were 24.54% and 59.68%, respectably. The average decrement in BD-rate and time was 18.33% and 56.14% in 4:4:4 sequences. Even though the BD-rate of some sequences were increased, it shows meaningful time reduction for these sequences.
Single instruction, multiple data (SIMD) is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Such machines exploit data level parallelism, but not concurrency. Easily, it is a kind of parallel processor which can perform the multiple calculations with one instruction Generally, SIMD operator can deal the 128 bits data operation in once. So if the data type is ‘short’, SIMD operator can handle the 8 operation at a time. The best advantage of SIMD operation is that it can reduce the computational complexity effectively without any loss of quality. The implementation history is shown in Table 6.
In case of YUV 4:2:2 10 bit sequences, 31.36% of overall encoding time is reduced in average without any loss of quality.
In case of YUV 4:4:4 10 bit sequences, 38.81% of overall encoding time is reduced in average without any loss of quality. The complexity reduction tendency according to QP variation is similar to each test set.
In this paper, we propose the multi-server based distributed coding using MPI (Message Passing Interface) protocol. 16 servers are used for distributed encoding. Input sequence is divided into random access unit. 10 random access units are coded in one server in parallel. The overall parallel structure of proposed method is shown in Fig. 5.
IV. EXPERIMETAL RESULTS
The proposed coding structure is adopted and tested on HEVC test model (HM) version 15.0+RExt-8.1, which is set as the anchor for comparison test.
To evaluate the coding performance of the proposed coding structure is compared with “intra only” configuration which is one of the common test condition in JCT-VC . The quantization parameter (QP) range which is defined in main tier is used for the experiment. The Bjøntegaard delta bit rate (BD-rate)  and time saving was used for the performance comparison measure. The proposed algorithm fully guarantee real-time encoding in Full-HD sequences. So, the UHD (Ultra High Definition) sequences are used in final evaluation. The overall results are shown in Table 8.
General real-time broadcasting encoder cover under 50Mbps data rate for encoding the 4K-UHD video. The coded bit rate of test sequences is average 200Mbps and PSNR is approximately from 42dB to 50dB. It is quite high quality studio quality data compare to broadcasting data. Even though the date rates are quite big, the average encoding speed of proposed method is 85fps. It means that the proposed method can encode the UHD sequence as faster than real time. The proposed method shows approximately 9500 times faster than compare to intra only structure. The BD-rate of the proposed methods is average -7.69% compare to “intra only” structure. It means that proposed method shows better compression performance than “intra only” structure with a similar video quality.
The new coding structure was proposed for efficient studio data editing. The proposed ULD structure can minimize the number of re-encoding picture. The coding time also can be reduced greatly by picture/tile level parallel processing and multi-server based distributed coding. In view of coding efficiency, the proposed method can reduce average 7.69% in BD-rate with 9500 times faster coding speed compare to “intra only” coding structure.