Weakly supervised object localization using attention-based neural networks
We consider the problem of weakly supervised learning for object localization. Given a collection of images with image-level annotations indicating the presence/absence of an object, our goal is to localize the object in each image. We propose a neural network architecture called the attention network for this problem. In addition to the attention network, we also propose three extensions. Firstly, we propose an ap- proach to regularized the attention scores so that it mimics the scoring distribution of a strong fully supervised object detector. Secondly, we also propose an approach to iteratively refined the result of our attention network. Lastly, we propose to combine both first and second extensions into a single network to achieve the best of both worlds. We demonstrate that all of our approaches achieve superior performance on several benchmark datasets.