Overview

As dynamic scene reconstruction from sparse views is extremely challenging, we present two key insights to initialize plausible geometry and motion:
- Initializing consistent scene geometry via confidence-aware spatio-temporal alignment
- Initializing motion trajectories by clustering per-point 3D semantic features distilled from 2D foundation models
Space-time consistent depth


Feature-based motion bases:

