Volcengine Retains Championship in Video Track of International Deep Learning Image Compression Challenge

Author: Multimedia Laboratory

Recently, the results of the 6th Deep Learning Image Compression Challenge (CLIC Competition for short) were announced. The joint participating platform b-2 formed by Volcengine Multimedia Laboratory and Peking University won the championships in both objective and subjective evaluation metrics for both the high-bitrate video compression and low-bitrate video compression tracks. This is also the second consecutive year that Volcengine Multimedia Laboratory has claimed the top spot in this track.

CLIC is hosted by the Institute of Electrical and Electronics Engineers (IEEE), and has received widespread attention from both academia and industry since its inception. The 2023 CLIC Competition was suspended for one year, and this year's event was held again in conjunction with the Data Compression Conference (DCC), a top-tier conference in the data compression field. It is worth noting that Volcengine Multimedia Laboratory had 8 papers accepted at this year's DCC.

As next-generation artificial intelligence technologies represented by deep learning continue to make breakthroughs, academia and industry have gradually recognized the enormous application potential of AI technology in image and video compression. Deep learning-based image and video compression technology is regarded as a rising star that surpasses the limits of traditional compression technologies and achieves breakthrough progress. Based on deep learning technology, Volcengine and Peking University's b-2 platform proposed an intelligent hybrid solution.

Deep Learning-Based Intelligent Hybrid Solution

The b-2 platform fully understands the respective principles of traditional compression technology and deep learning-based compression technology, leverages the respective strengths of the two technical routes, learns from each other's strengths, and organically integrates the two into a cohesive whole, forming a unique traditional-intelligent hybrid solution. The traditional coding module adds innovative technologies such as asymmetric quad-tree partitioning on top of the existing industry-standard traditional coding framework. The intelligent coding module introduces deep learning-based loop filtering and other related technologies.

Figure 1 Asymmetric quad-tree partitioning structure; (a) H1-type horizontal UQT, (b) H2-type horizontal UQT, (c) V1-type vertical UQT, (d) V2-type vertical UQT.

Coding unit partitioning is the foundation of hybrid video coding frameworks, determining the basic shape and size of coding units. Flexible partitioning methods can more effectively express the rich textures and motion characteristics of videos, and play a critical role in improving coding performance. The team proposed an asymmetric quad-tree (UQT) partitioning structure to enhance video coding efficiency. Compared with existing quad-tree (Quad Tree, QT), binary tree (Binary Tree, BT), and ternary tree (Ternary Tree, TT) partitioning structures, the sub-coding units generated by a single UQT partition can reach a deeper partitioning depth, enabling more effective capture of the rich detail characteristics of videos. Additionally, the sub-block shapes generated by UQT cannot be achieved by combining QT, BT, and TT, which to a certain extent compensates for the shortcomings of existing partitioning methods and enriches the expressiveness of partitioning.

Figure 2 Schematic diagram of the loop filtering network structure, including the network's input, filtering, and output modules

Traditional video coding uses loop filters to eliminate coding distortion and reduce the gap between the original image and the reconstructed image, such as classic deblocking filtering, sample adaptive offset, and adaptive loop filtering. The participating platform proposed an enhanced loop filtering technology based on residual convolutional networks, which organically combines loop filtering technology with deep learning technology. It makes full use of the prior information of traditional video coding in both the network structure and model training to improve loop filtering efficiency. In terms of network input, in addition to reconstructed pixels, the team used prediction information, partitioning information, boundary strength, and quantization parameters from the coding process as enhanced information for the deep network to learn, enriching the prior knowledge so that the network can better perceive compression distortion. In the hierarchical reference coding structure, the frame to be coded will reference previously reconstructed high-quality frames. The team proposed an iterative training method for filters used in frames of different temporal layers to obtain training data that closely matches real-world coding scenarios, achieving higher-performance filtering. Furthermore, each slice and maximum coding unit can adaptively select the network model with the best rate-distortion performance from multiple filtering models, and transmit the selection information to the decoding end.

Figure 3 MOS-based leaderboard for the CLIC video compression track

In 2022, Volcengine Multimedia Laboratory participated in the CLIC Competition for the first time. Its participating platform Neutron Star won the championships in both the high-bitrate video compression and low-bitrate video compression tracks, with a significant lead in both objective and subjective evaluation metrics.

This time, Volcengine and Peking University joined forces to claim the championship, combining PKU's disciplinary and talent advantages with Volcengine's technical and industrial advantages, marking an important academic exploration around deep learning in the field of video compression.

Volcengine Multimedia Laboratory is a research team under ByteDance, dedicated to exploring cutting-edge technologies in the multimedia field and participating in international standardization work. Its numerous innovative algorithms and software and hardware solutions have been widely applied in multimedia businesses of products such as Douyin and Xigua Video, and provide technical services to enterprise-level customers of Volcengine. Since its establishment, the laboratory has had multiple papers accepted in international top conferences and flagship journals, and has won several international technical competition championships, industry innovation awards, and best paper awards.


This is a discussion topic split from the original post at https://juejin.cn/post/7351301965031161890