Pipeline Overview

Our project develops an adaptive data-collection and reconstruction pipeline that uses a single 360° video stream to produce high-fidelity 3D Gaussian Splatting (3DGS) models. By leveraging the full omnidirectional coverage of 360° cameras, we enable easier capture, more stable Structure-from-Motion (SfM), and improved robustness in challenging environments with distractors, lighting variations, or scene changes.


SuperPoint & SuperGlue

The first two steps of bundle adjustment are feature detection and matching. We enhance accuracy by using SuperPoint for feature detection and description, and LightGlue—an attention-based improvement of SuperGlue—for matching. Equirectangular images are decomposed into six perspective views, and the matched features are then fed into rig-based bundle adjustment.

Rig-Based SfM

An equirectangular image can be decomposed into a cubemap, producing six perspective images compatible with existing bundle adjustment and SfM pipelines. While this increases the number of images by sixfold, the six views are physically constrained to one another, allowing them to be treated as a single rig. With known camera-to-rig extrinsics, the optimization only needs to track the rig’s position and orientation over time. Bundle adjustment minimizes the squared re-projection error of 3D points onto each camera.

Monocular Depth Estimation with Scale Alignment

This module predicts dense depth maps from single images and aligns them to a global scale using known reference points or camera poses. It enables detailed scene understanding even with limited multi-view data.

GS for Unconstrained Photo Collections

The final module applies Gaussian Splatting (GS) to reconstruct high-fidelity 3D geometry from unstructured and unconstrained photo collections. It fuses information from multiple views while handling noise, lighting variation, and sparse viewpoints to produce a coherent 3D model.