SLAM & Depth Preprocessing
A key requirement for high-quality Gaussian Splatting is having precise camera poses and sufficient 3D structure of the scene. Our pipeline begins with a preprocessing stage that uses OpenVSLAM for camera tracking and mapping, followed by PatchMatch Stereo for dense depth estimation.
Why OpenVSLAM?
We adopt OpenVSLAM (Stella) as our SLAM backbone due to its flexibility, real-time performance, and strong support for 360° environments:
Multi-Camera and Multi-Format Support
Supports monocular, stereo, RGB-D, and fisheye cameras with both perspective and equirectangular projections—ideal for 360° cameras using dual-fisheye or equirectangular video formats.
Global Map Optimization
Performs loop closure detection, pose graph optimization (via g2o), and bundle adjustment to refine both camera poses and sparse map structure.
Real-Time Performance
Efficient enough to run on modest hardware, enabling rapid processing of long capture sequences.
360° SLAM Advantage
Operating in panoramic space improves feature tracking, especially in outdoor environments with textureless surfaces or repetitive structures, such as roads or building facades.
Dense Geometry with PatchMatch Stereo
While SLAM provides a sparse point cloud, we further recover fine-grained scene geometry using PatchMatch Stereo, applied on perspective cube map faces derived from 360° video.
What It Does
PatchMatch Stereo computes dense per-pixel depth maps by iteratively matching local patches between adjacent frames using randomized search and propagation strategies.
Why It Matters
- Produces high-resolution depth maps, filling in structure where SLAM lacks points.
- Maintains accuracy in low-texture or ambiguous areas through randomized propagation.
- Efficient enough to scale with long video sequences.
The resulting dense point cloud enhances the geometric initialization for Gaussian Splatting and enables more complete and detailed reconstructions.

Outputs from this stage include:
- Sparse point cloud (from SLAM)
- Dense point cloud (from PatchMatch Stereo)
- Estimated camera poses for each extracted frame
- Perspective cube map faces from 360° input
Together, these components form the geometric and positional foundation for the subsequent splatting stage.
