Skip to content
Motivation
- Manipulation tasks benefit from accurate depth and shape understanding.
- New 3D vision foundation models enable fast, calibration-free, multi-view 3D reconstruction using standard RGB cameras.
Problem Statements
- Faster global alignment to achieve robust, real-time multi-view 3D reconstruction for manipulation.
- Leverage geometric foundation models (VGGT) to provide stronger geometric features that improve scene understanding and downstream manipulation tasks.
- Use learning-based shape completion to turn partial RGB-D observations into complete point clouds, enabling more stable and reliable grasp planning.