Efficient deep learning models for video abstraction

Thumbnail Image
Date
2020-08
Authors
Rochan, Mrigank
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With the revolution in digital video technology, video data are ubiquitous and explosively growing. There is a compelling need to develop efficient automated techniques to manage video data. Therefore, video abstraction is of significant interest to the computer vision research community. The objective in video abstraction is to automatically create a short visual summary of a long input video so that a user can get certain perspectives of the video without watching or accessing it entirely. This mechanism would allow to easily preview, categorize, search, and edit the huge amount of video data. In this thesis, we push the state of the art in video abstraction in several ways. Firstly, we develop fully convolutional sequence deep learning models that address the computational limitations of the previous deep learning models for video abstraction. Secondly, we propose a new formulation of unpaired training data for the model learning to reduce the need of expensive labeled training data for supervised learning. Thirdly, since video abstraction has a degree of subjectiveness to it, we realize a model that yields personalized and user-specific predictions by referring to the user's previously created summaries. Lastly, we extend this user adaptive model such that it can handle natural language textual queries from users and make predictions that are semantically related to the queries. Although we focus on video abstraction in this thesis, we believe that our models can potentially be applied to other video understanding problems (e.g., video classification, action recognition, and video captioning).
Description
Keywords
Video abstraction, Deep learning
Citation
Rochan, M., Ye, L., & Wang, Y. (2018). Video Summarization Using Fully Convolutional Sequence Networks. In European Conference on Computer Vision (pp. 358-374). Springer, Cham.
Rochan, M., & Wang, Y. (2019). Video Summarization by Learning From Unpaired Data. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 7894-7903). IEEE.
Rochan, M., Reddy, M. K. K., Ye, L., & Wang, Y. (2020). Adaptive Video Highlight Detection by Learning from User History. In European Conference on Computer Vision (ECCV). Springer, Cham, forthcoming.
Rochan, M., Reddy, M. K. K., & Wang, Y. (2020). Sentence Guided Temporal Modulation for Dynamic Video Thumbnail Generation. In British Machine Vision Conference (BMVC), forthcoming.