The inceptionNet-based model produces similar results to the ResNet-based model with nearly half the number of FLOPs. Thus, we went ahead with InceptionNet architecture itself to train on the Smith Hall dataset collected using Aria glasses.
The metrics used here are RMSE (Root Mean Square Error) over position and orientation error in terms of the difference in degrees between predicted and ground truth. As we can see, the position and orientation error using Visual Odometry is pretty low.
The model has been tested on 2 completely different datasets. The King’s College dataset is an outdoor dataset with a variety of difficult scenarios like motion blur, a clutter of pedestrians, and varying lighting conditions. On the other hand, the Smith Hall dataset is an indoor dataset that has very few keypoint features in certain areas and dim lighting as well. Thus, the Visual only odometry module performs robustly.