Motivation
- Manipulation tasks benefit from accurate depth and shape understanding.
- 3D sensors (LiDAR, RGB-D) can be costly, hard to calibrate, and prone to occlusions and noise.
- New 3D vision foundation models enable fast, calibration-free, multi-view 3D reconstruction using standard RGB cameras.
Problem
- Explore how different 3D vision foundation models can be used to learn better manipulation policies.