In this present day and age of smart phones, ipads and other smart devices a digital camera has become an integral part of our daily experience. This combined with our desire to capture moments in images has lead to a vast repository of myriad images on the internet.

In this project we want to leverage the rich structured information hidden in the images to learn generic and common sense concepts through developing a powerful semantic embedding space. For handling this challenge, we aim to develop novel neural encoder-decoder architectures focusing on novelties in both the components. Moreover, we also want to explore the problem under the Semi-supervised and Unsupervised framework using Active Learning.

Active Learning provides us a new window to explore the model capacity of neural architectures for generalizing to novel concepts, generate increasingly fluid, informative responses and the opportunity to perform natural pervasive learning. More specifically, we would like to explore active learning strategies for automatically identifying batches of instances which offer highest as well as lowest entropy for the concerned neural model while also paying attention to maximum error reduction for the task at hand.

Neural Image Caption Generation Pipeline
Active Learning Scheme for Caption Generation Problem