Recurrent Neural Network for Learning Spatial and Temporal Information from Videos

Loading...
Thumbnail Image
Date
2019
Authors
Nabavi, Seyed shahabeddin
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Recurrent Neural Network is a well-established tool for sequential modelling. It includes a variety of techniques and models to extract temporal information from a sequence of data (e.g. frames of a video sequence). This thesis presents novel end-to-end deep learning recurrent based architectures for two computer vision problems: semantic segmentation prediction and camera pose estimation. Firstly, we investigate the problem of extracting temporal information in the context of semantic segmentation prediction. we demonstrate the capability of recurrent architecture in feature prediction by presenting a novel encoder-decoder convolutional LSTM architecture. We also utilize a bidirectional convolutional LSTM as an extension of our work. Furthermore, we explore a step-by-step extraction of spatial information in the problem of monocular camera pose estimation with an end-to-end unsupervised training scheme which relies on a recurrent based pose estimator. We illustrate the contribution of recurrent estimation (a.k.a step-by-step estimation) in the estimation of large displacements and complex transformations. We also show the impact of this process on the monocular depth estimation process.
Description
Keywords
Future semantic segmentation, Recurrent neural network, Unsupervised camera pose estimation, Spatial information, Deep learning, Computer vision, temporal information, Video prediction
Citation
Nabavi, Seyed shahabeddin. Rochan, Mrigank.Wang, Yang. (2018). Future Semantic Segmentation with Convolutional LSTM. British Machine Vision Conference