Efficient deep learning models for video abstraction

dc.contributor.authorRochan, Mrigank
dc.contributor.examiningcommitteeLivi, Lorenzo (Computer Science and Mathematics) Hossain, Ekram (Electrical and Computer Engineering) Little, James J. (University of British Columbia)en_US
dc.contributor.supervisorWang, Yang (Computer Science)en_US
dc.date.accessioned2020-09-07T20:09:47Z
dc.date.available2020-09-07T20:09:47Z
dc.date.copyright2020-08-20
dc.date.issued2020-08en_US
dc.date.submitted2020-08-21T01:24:44Zen_US
dc.degree.disciplineComputer Scienceen_US
dc.degree.levelDoctor of Philosophy (Ph.D.)en_US
dc.description.abstractWith the revolution in digital video technology, video data are ubiquitous and explosively growing. There is a compelling need to develop efficient automated techniques to manage video data. Therefore, video abstraction is of significant interest to the computer vision research community. The objective in video abstraction is to automatically create a short visual summary of a long input video so that a user can get certain perspectives of the video without watching or accessing it entirely. This mechanism would allow to easily preview, categorize, search, and edit the huge amount of video data. In this thesis, we push the state of the art in video abstraction in several ways. Firstly, we develop fully convolutional sequence deep learning models that address the computational limitations of the previous deep learning models for video abstraction. Secondly, we propose a new formulation of unpaired training data for the model learning to reduce the need of expensive labeled training data for supervised learning. Thirdly, since video abstraction has a degree of subjectiveness to it, we realize a model that yields personalized and user-specific predictions by referring to the user's previously created summaries. Lastly, we extend this user adaptive model such that it can handle natural language textual queries from users and make predictions that are semantically related to the queries. Although we focus on video abstraction in this thesis, we believe that our models can potentially be applied to other video understanding problems (e.g., video classification, action recognition, and video captioning).en_US
dc.description.noteOctober 2020en_US
dc.identifier.citationRochan, M., Ye, L., & Wang, Y. (2018). Video Summarization Using Fully Convolutional Sequence Networks. In European Conference on Computer Vision (pp. 358-374). Springer, Cham.en_US
dc.identifier.citationRochan, M., & Wang, Y. (2019). Video Summarization by Learning From Unpaired Data. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 7894-7903). IEEE.en_US
dc.identifier.citationRochan, M., Reddy, M. K. K., Ye, L., & Wang, Y. (2020). Adaptive Video Highlight Detection by Learning from User History. In European Conference on Computer Vision (ECCV). Springer, Cham, forthcoming.en_US
dc.identifier.citationRochan, M., Reddy, M. K. K., & Wang, Y. (2020). Sentence Guided Temporal Modulation for Dynamic Video Thumbnail Generation. In British Machine Vision Conference (BMVC), forthcoming.en_US
dc.identifier.urihttp://hdl.handle.net/1993/34958
dc.language.isoengen_US
dc.rightsopen accessen_US
dc.subjectVideo abstraction, Deep learningen_US
dc.titleEfficient deep learning models for video abstractionen_US
dc.typedoctoral thesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
rochan_mrigank.pdf
Size:
21.73 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.2 KB
Format:
Item-specific license agreed to upon submission
Description: