What is Structure from Motion (SfM)?
SfM aims to reconstruct both the 3D structure of a scene and the camera poses from a set of input images. It has broad applications—from robotics to Augmented Reality.
Related Work
In recent years, deep learning-based SfM methods have emerged. For example, DiffusionSfM from CVPR 2025 uses a diffusion model to jointly predict camera poses and 3D points in an unordered setting.

Another method, VGGT, follows a regression-based approach and uses transformers to directly output the 3D scene from multi-view inputs.While these methods improve robustness and scalability, they still implicitly define the coordinate frame using the first image.
