CNN based bi-directional prediction for complexity reduction of high efficiency video coding
De Silva, Tharuki Rangana
MetadataShow full item record
Real-time video streaming has become the largest portion of internet traffic in recent years. Therefore, improving the efficiency of video coding remains an important research issue. Beyond the level of compression, there are two other factors that must be considered to determine the efficiency of a real-time video codec: decoded video quality and the computational complexity of the encoding and decoding processes. Modern video codecs rely on inter-frame prediction for efficient coding. However, inter-frame prediction used in modern codecs is one of the most computationally expensive and time-consuming operations. Convolutional neural networks (CNN) have been used in recent research for inter-frame prediction tasks. The CNN architectures in previous work have been used without regard to the model complexity and computational efficiency. The objective of this thesis is to develop a CNN based low complexity bi-prediction algorithm for video coding. The contribution of this thesis consists of three parts. In the first part, a simple floating point CNN architecture has been developed to perform the bi-prediction operation in video coding with an accuracy comparable to that produced by motion estimation and compensation used in modern video encoders. This architecture is then quantized to derive an integer arithmetic only CNN to further reduce the computational complexity. It is shown that the encoding time for integer CNN is considerably lower compared to the floating point CNN. The experimental results have shown that this conversion only causes a minor loss of prediction accuracy. In the final part, it is experimentally shown that the proposed integer arithmetic CNN bi-prediction algorithm has a lower computational cost and better video quality compared to the conventional motion estimation based bi-prediction. Further, it is shown that CNN based bi-prediction can contribute to a rate-distortion performance improvement in video coding.