Prior Work & Preliminaries

Prior Work

VR-NeRF: High-Fidelity Virtualized Walkable Spaces (Xu et al. SIGGRAPH 2023)

  1. VR Rendering: Achieves a 2K x 2K resolution at 72 FPS with the power of 20 GPUs using a neural radiance field (NeRF)
  2. Eyeful Tower: A capture rig with 22 cameras distributed along 7 levels
  3. Dataset: This study offers a robust dataset comprising scenes such as office rooms and apartments, each containing over 1500 images

Figure 1: The custom EyeFul Tower Capture rig comprising of 22 HDRI cameras.

Figure 2: This graph shows the number of GPUs vs the FPS they achieve. VR-NeRF needs an average of 20 GPUs to smoothly render a scene at 36FPS, making it impractical for many use cases.


Preliminaries

3D Gaussian Splatting for Real-Time Radiance Field Rendering (Bernhard Kerbl et al. SIGGRAPH 2023)

  • Represent a 3D scene using learnable 3D gaussians as primitives.
  • Using a differentiable rasterizer and reconstruction loss guidance, the properties (namely position, size, orientation and opacity) of these gaussians can be optimized to fit the 3D scene.
  • Takeaway: A great alternative to NeRFs since we can rasterize and render gaussians at 100+ FPS.

Figure: Gaussians as blobs blending together to define a 3D scene


Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering (Tao Lu et al. CVPR 2024)

  • 3D Gaussian Splatting (3D-GS) overfits to the training views and generates a large number of redundant Gaussians.
  • Hence, instead of allowing Gaussians to freely split and drift, constrain their distribution around a few “anchor points”.
  • Generate properties of the Gaussians (color, opacity, scale, rotation) on-the-fly based on viewing direction using tiny MLPs.
  • Takeaway: Scaffold-GS significantly outperforms vanilla 3D-GS due to the intuitive improvements proposed in the pipeline. Therefore, we choose to use Scaffold-GS as our baseline 3D reconstruction pipeline.

Figure: Scaffold-GS methodology- a small fixed number of gaussians tethered to anchor points.

Table: Comparisons between VR-NeRF, vanilla 3D-GS and Scaffold-GS on the ‘office1b’ and ‘office_view2’ scenes from the EyeFul Tower Dataset.