Efficient deep learning models for video abstraction
dc.contributor.author | Rochan, Mrigank | |
dc.contributor.examiningcommittee | Livi, Lorenzo (Computer Science and Mathematics) Hossain, Ekram (Electrical and Computer Engineering) Little, James J. (University of British Columbia) | en_US |
dc.contributor.supervisor | Wang, Yang (Computer Science) | en_US |
dc.date.accessioned | 2020-09-07T20:09:47Z | |
dc.date.available | 2020-09-07T20:09:47Z | |
dc.date.copyright | 2020-08-20 | |
dc.date.issued | 2020-08 | en_US |
dc.date.submitted | 2020-08-21T01:24:44Z | en_US |
dc.degree.discipline | Computer Science | en_US |
dc.degree.level | Doctor of Philosophy (Ph.D.) | en_US |
dc.description.abstract | With the revolution in digital video technology, video data are ubiquitous and explosively growing. There is a compelling need to develop efficient automated techniques to manage video data. Therefore, video abstraction is of significant interest to the computer vision research community. The objective in video abstraction is to automatically create a short visual summary of a long input video so that a user can get certain perspectives of the video without watching or accessing it entirely. This mechanism would allow to easily preview, categorize, search, and edit the huge amount of video data. In this thesis, we push the state of the art in video abstraction in several ways. Firstly, we develop fully convolutional sequence deep learning models that address the computational limitations of the previous deep learning models for video abstraction. Secondly, we propose a new formulation of unpaired training data for the model learning to reduce the need of expensive labeled training data for supervised learning. Thirdly, since video abstraction has a degree of subjectiveness to it, we realize a model that yields personalized and user-specific predictions by referring to the user's previously created summaries. Lastly, we extend this user adaptive model such that it can handle natural language textual queries from users and make predictions that are semantically related to the queries. Although we focus on video abstraction in this thesis, we believe that our models can potentially be applied to other video understanding problems (e.g., video classification, action recognition, and video captioning). | en_US |
dc.description.note | October 2020 | en_US |
dc.identifier.citation | Rochan, M., Ye, L., & Wang, Y. (2018). Video Summarization Using Fully Convolutional Sequence Networks. In European Conference on Computer Vision (pp. 358-374). Springer, Cham. | en_US |
dc.identifier.citation | Rochan, M., & Wang, Y. (2019). Video Summarization by Learning From Unpaired Data. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 7894-7903). IEEE. | en_US |
dc.identifier.citation | Rochan, M., Reddy, M. K. K., Ye, L., & Wang, Y. (2020). Adaptive Video Highlight Detection by Learning from User History. In European Conference on Computer Vision (ECCV). Springer, Cham, forthcoming. | en_US |
dc.identifier.citation | Rochan, M., Reddy, M. K. K., & Wang, Y. (2020). Sentence Guided Temporal Modulation for Dynamic Video Thumbnail Generation. In British Machine Vision Conference (BMVC), forthcoming. | en_US |
dc.identifier.uri | http://hdl.handle.net/1993/34958 | |
dc.language.iso | eng | en_US |
dc.rights | open access | en_US |
dc.subject | Video abstraction, Deep learning | en_US |
dc.title | Efficient deep learning models for video abstraction | en_US |
dc.type | doctoral thesis | en_US |