Introduction

We present an end-to-end pipeline for outdoor scene reconstruction using monocular 360° video, designed to reduce data collection overhead and improve robustness under real-world conditions. The pipeline is divided into two core components: pre-processing and post-processing, each addressing a distinct set of challenges.

In the pre-processing stage, we leverage a 360° SLAM system tailored for monocular panoramic input. This module integrates bundle adjustment, loop closure, and supports multi-sequence mapping across multiple traversals. It outputs both sparse and dense reconstructions, along with accurate camera pose estimates for each frame. While the current system runs offline, we are actively developing a real-time extension to provide live feedback, allowing users to visualize their position and monitor coverage during data collection.

In the post-processing stage, we convert equirectangular frames into perspective cubemap views and sample viewpoints based on spatial coverage to ensure uniform training supervision. To handle the photometric variability inherent to outdoor scenes, we adopt a lighting-aware variant of Gaussian Splatting, incorporating per-image appearance embeddings, neural color fields, and a spherical harmonics-based background model. These enhancements significantly improve reconstruction fidelity under dynamic lighting and appearance conditions.

Our system enables the creation of high-fidelity Gaussian Splatting (GS) scenes from unconstrained, real-world environments using only a single pass of 360° video capture.

Looking ahead, we plan to integrate semantic filtering to remove dynamic distractors such as pedestrians and vehicles, further improving reconstruction quality. Additionally, we aim to extend the data collection interface with real-time feedback on data quality and under-constrained regions, guiding users toward more complete and efficient scene coverage.