Data Collection Format: 360° Cameras
To support high-fidelity reconstruction and robust SLAM, we use a monocular 360° camera for outdoor data collection. Unlike traditional narrow field-of-view cameras (Handheld devices) which require careful path planning to ensure sufficient coverage, 360° cameras capture the entire surrounding scene in every frame. This dramatically reduces blind spots and eliminates the need for complex multi-pass recordings, making them ideal for unstructured outdoor environments.
In the context of Gaussian Splatting, where reconstruction quality depends heavily on view diversity and scene coverage, 360° imagery offers dense multi-view supervision from a single trajectory. For SLAM, the panoramic input significantly improves feature persistence and loop closure detection, especially in textureless or repetitive regions. The result is a more efficient and robust pipeline, with fewer artifacts due to occlusions, missing views, or tracking drift.
As part of our pipeline, users are free to walk naturally through a scene without following strict paths or capture protocols. Running SLAM directly on 360° video enables reliable feature tracking even in narrow corridors or low-texture regions, resulting in improved robustness and trajectory accuracy.
Since the current implementation runs offline, our pipeline supports any 360° video (equirectangular or dual-fisheye formats), making it compatible with a wide range of commercially available cameras. Furthermore, handheld devices with pinhole cameras can be used to record and bolster under-constrained regions.
1. Preprocessing: SLAM
A core requirement for Gaussian Splatting is access to precise camera poses for each image. We begin with a SLAM-based preprocessing stage that outputs both trajectory and geometric structure.
Input:
- 360° monocular video (equirectangular or dual-fisheye)
Output:
- Sparse point cloud from feature-based SLAM
- Dense point cloud reconstructed via PatchMatch Stereo
- Estimated camera poses for all extracted frames
- 6-perspective cubemap faces per 360° frame
Details of this stage are provided in the OpenVSLAM section, which outlines the mapping, tracking, and optimization procedures used.
2. Postprocessing: Gaussian Splatting
After preprocessing, the cubemap views are passed into our Gaussian Splatting pipeline for 3D scene reconstruction. Outdoor data, captured under natural lighting, often contains significant photometric variation, which poses challenges for models trained with photometric consistency losses.
To address this, we adopt a lighting-aware variant of Gaussian Splatting that includes:
- Per-Gaussian neural color features
- Per-image appearance embeddings
- A spherical harmonics-based background model to account for global lighting variation and sky/background consistency
This enhanced formulation improves robustness to environmental changes and enables photorealistic reconstructions from real-world, unconstrained imagery.
Additional architectural details and training specifics are available in the “GS in the Wild” section.
References
Hartmut Surmann, Marc Thurow, and Dominik Slomma. PatchMatch-Stereo-Panorama: A Fast Dense Reconstruction from 360° Video Images. arXiv preprint arXiv:2211.16266, 2022. Available at: https://arxiv.org/abs/2211.16266
Congrong Xu, Justin Kerr, and Angjoo Kanazawa. Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections. arXiv preprint arXiv:2407.12306, 2024. Available at: https://arxiv.org/abs/2407.12306