Related Works - Towards Universal State Estimation and Reconstruction in the Wild

1. SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM

· Proof that 3D Gaussians are useful representations for Dense SLAM
· Limitations: Low Speed; Needs accurate depth; Memory and compute requirements grow fastly with scene size

2. AnyLoc: Towards Universal Visual Place Recognition

· Self-supervised Visual Features for Zero-shot Localization
· Showcases the Semantic Consistency of DINOv2 features

This image has an empty alt attribute; its file name is 2-1-1024x346.png

3. DUSt3R: Geometric 3D Vision Made Easy

· Strong utilization of priors in geometric vision
· Beats SOTA on sparse two view registration
· Limitations: Out of distribution data; Can’t handle long term sequences

4. Mast3R: Masked 3D Region-based Pretraining for Robust Representations

· Employs masked pretraining to learn robust 3D region-based representations
· Achieves high performance in object recognition and 3D segmentation tasks
· Limitations: Struggles with fine-grained details; Requires large-scale pretraining datasets

5. DINOv2: Learning Robust Visual Features without Supervision

· Introduce a discriminative self-supervised method for robust visual feature learning
· Maintain high performance in various scenarios and does not require fine-tuning
· Slow runtime may not fulfill the real-time requirements needed for SLAM system

6. EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

· Multi-scale linear attention mechanism for high-resolution dense prediction tasks
· Improve computational efficiency significantly