I did my internship under Prof. Ian D Reid at University of Adelaide, I was supposed to work on a ConvNet architecture that can give us the depth and the pose at the same time! So in essence a full-fledged vSLAM system, using a Deep Learning framework. Should be possible, isn’t it? Just imagine when you were a kid you were not taught geometry and concepts of optics and vision to move around, you just started moving around, bumping, falling and eventually learning how to walk! Can we train a DeepNet with a similar idea?