Related Work

Recent advances in Computer Vision have seen significant developments in learning-based methods for Structure from Motion (SfM). Some focus on its parts like feature detection [3] and matching [4], other focus on an end-to-end SfM pipeline [6]. These new advancements are foundational for numerous applications, including augmented reality, 3D reconstruction, and autonomous navigation.

  1. Neural Face Rendering Dataset: The paper by Wuu et al. (2022) introduces the Multiface dataset used for our project, which presents a comprehensive dataset specifically designed for training and evaluating neural rendering algorithms. This dataset aids in overcoming challenges related to facial diversity and rendering under variable lighting conditions, positioning it as a crucial resource for developing more realistic and adaptable face-rendering technologies.
  2. Structure-from-Motion Revisited (aka COLMAP): Schönberger and Frahm (2016) revisit the well-established area of SfM. They propose enhancements to traditional SfM techniques, improving the accuracy and efficiency of 3D reconstruction from 2D images. This paper is pivotal in advancing SfM algorithms, providing a robust framework that has influenced subsequent research and continues to stand the test of time.
  3. Interest Point Detection and Description: SuperPoint by DeTone et al. (2018) introduces a novel approach for detecting and describing interest points in images using self-supervised learning. This method, one of the first to be learning based, significantly advances feature detection capabilities, essential for tasks such as object recognition and scene understanding.
  4. Graph Neural Networks for Feature Matching: Sarlin et al. (2020) propose SuperGlue, which leverages graph neural networks to improve the accuracy and robustness of feature matching across different views of a same scene. This learning-based approach, when paired with SuperPoint, produces a strong collection of matching points, greatly enhancing the SfM pipeline’s ability to generate better 3D reconstructions and camera parameters.
  5. Featuremetric Refinement in SfM: Lindenberger et al. (2021) further refine the SfM pipeline in their work Pixel-Perfect SfM. They introduce a method that integrates featuremetric refinement into SfM, allowing for pixel-perfect reconstruction accuracy, which is crucial for precise mapping and modeling applications.
  6. Deep Learning in SfM: the work of Wang et al. (2024) VGGSfM is a novel deep-learning based SfM method. This approach is fully differentiable and trained end-to-end, unlike previous methods which only enhanced some parts of the SfM pipeline in an incremental manner. This offers substantial improvements in accuracy, scalability and robustness.


[1] Wuu, C.-h., Zheng, N. et al., 2022. “Multiface: A Dataset for Neural Face Rendering.” Available at:

[2] Schönberger, J. L., & Frahm, J.-M., 2016. “Structure-from-Motion Revisited.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104-4113.

[3] DeTone, D. et al., 2018. “Superpoint: Self-supervised interest point detection and description.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops).

[4] Sarlin, P. et al., 2020. “Superglue: Learning feature matching with graph neural networks.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Lindenberger, P., Sarlin, P., Larsson, V., & Pollefeys, M., 2021. “Pixel-Perfect Structure-from-Motion with Featuremetric Refinement.” In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5967-5977.

[6] Wang, J., Karaev, N., Rupprecht, C., & Novotny, D., 2024. “VGGSfM: Visual Geometry Grounded Deep Structure from Motion.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).