Efficient deep learning models for video abstraction

Rochan, Mrigank

Efficient deep learning models for video abstraction

dc.contributor.author	Rochan, Mrigank
dc.contributor.examiningcommittee	Livi, Lorenzo (Computer Science and Mathematics) Hossain, Ekram (Electrical and Computer Engineering) Little, James J. (University of British Columbia)	en_US
dc.contributor.supervisor	Wang, Yang (Computer Science)	en_US
dc.date.accessioned	2020-09-07T20:09:47Z
dc.date.available	2020-09-07T20:09:47Z
dc.date.copyright	2020-08-20
dc.date.issued	2020-08	en_US
dc.date.submitted	2020-08-21T01:24:44Z	en_US
dc.degree.discipline	Computer Science	en_US
dc.degree.level	Doctor of Philosophy (Ph.D.)	en_US
dc.description.abstract	With the revolution in digital video technology, video data are ubiquitous and explosively growing. There is a compelling need to develop efficient automated techniques to manage video data. Therefore, video abstraction is of significant interest to the computer vision research community. The objective in video abstraction is to automatically create a short visual summary of a long input video so that a user can get certain perspectives of the video without watching or accessing it entirely. This mechanism would allow to easily preview, categorize, search, and edit the huge amount of video data. In this thesis, we push the state of the art in video abstraction in several ways. Firstly, we develop fully convolutional sequence deep learning models that address the computational limitations of the previous deep learning models for video abstraction. Secondly, we propose a new formulation of unpaired training data for the model learning to reduce the need of expensive labeled training data for supervised learning. Thirdly, since video abstraction has a degree of subjectiveness to it, we realize a model that yields personalized and user-specific predictions by referring to the user's previously created summaries. Lastly, we extend this user adaptive model such that it can handle natural language textual queries from users and make predictions that are semantically related to the queries. Although we focus on video abstraction in this thesis, we believe that our models can potentially be applied to other video understanding problems (e.g., video classification, action recognition, and video captioning).	en_US
dc.description.note	October 2020	en_US
dc.identifier.citation	Rochan, M., Ye, L., & Wang, Y. (2018). Video Summarization Using Fully Convolutional Sequence Networks. In European Conference on Computer Vision (pp. 358-374). Springer, Cham.	en_US
dc.identifier.citation	Rochan, M., & Wang, Y. (2019). Video Summarization by Learning From Unpaired Data. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 7894-7903). IEEE.	en_US
dc.identifier.citation	Rochan, M., Reddy, M. K. K., Ye, L., & Wang, Y. (2020). Adaptive Video Highlight Detection by Learning from User History. In European Conference on Computer Vision (ECCV). Springer, Cham, forthcoming.	en_US
dc.identifier.citation	Rochan, M., Reddy, M. K. K., & Wang, Y. (2020). Sentence Guided Temporal Modulation for Dynamic Video Thumbnail Generation. In British Machine Vision Conference (BMVC), forthcoming.	en_US
dc.identifier.uri	http://hdl.handle.net/1993/34958
dc.language.iso	eng	en_US
dc.rights	open access	en_US
dc.subject	Video abstraction, Deep learning	en_US
dc.title	Efficient deep learning models for video abstraction	en_US
dc.type	doctoral thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: rochan_mrigank.pdf
Size:: 21.73 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.2 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

FGS - Electronic Theses and Practica