Introduction

High-quality 3D reconstruction increasingly depends heavily on precise camera poses, especially for methods such as Gaussian Splatting and NeRF that use photometric supervision. Even minor errors in pose estimation can lead to misalignments, distortions, or unstable training. 

At the same time, 360° cameras are becoming increasingly popular for mapping and spatial capture because they provide full omnidirectional coverage and far more visual overlap than conventional cameras. However, existing SLAM systems like OpenVSLAM and Stella VSLAM are built for real-time navigation rather than dense photometric accuracy, and modern consumer 360° devices—despite higher resolutions and improved optics—still lack depth sensors or strong IMUs, limiting the precision of their pose estimates.

Our project fills this gap by introducing a complete offline pose-optimization pipeline specifically tailored to 360° video. By combining learning-based feature detection, robust matching, and a physically constrained rig-based Structure-from-Motion formulation, our system transforms coarse SLAM trajectories into precise, subpixel-accurate poses suitable for Gaussian Splatting. This enables 360° capture to reach its full potential as a practical and accessible method for generating high-quality 3D reconstructions. 

Furthermore, we enhance the Gaussian Splatting pipeline by incorporating learned image embeddings to better handle lighting and exposure variations, and by introducing depth priors to improve robustness under sparse input imagery. These additions lead to significant improvements in reconstruction quality, as demonstrated by gains in standard photometric evaluation metrics.

Through reconstruction of diverse real-world captures, we demonstrate significant improvements in reconstruction quality, establishing our pipeline as a crucial step toward reliable 360° based scene modeling.