Related Work


Novel view synthesis (NVS) is pivotal in the realm of 3D vision due to its multifaceted importance across various applications, including virtual reality (VR), augmented reality (AR), and movie production. It enables the generation of images from any desired viewpoint or timestamp within a scene, thus facilitating immersive experiences and enhancing visual storytelling. Novel view synthesis for dynamic scenes is extremely challenging due to the complex and unpredictable motion of objects, requiring algorithms capable of handling non-rigid deformations and temporal coherence.

Dynamic Fusion[1] is the first dense SLAM system capable of reconstructing non-rigidly deforming scenes in real-time. It learns a rigid canonical space and then predicts a per-frame warp field that transforms the canonical frame into the current observation frame. Their method adapts the warp-field structure to capture newly seen regions and continuously updates the canonical space as new depth data becomes available, enabling robust tracking of non-rigid deformations.

The field of novel-view synthesis gained widespread attention following the introduction of NeRF[2]. Subsequently, numerous papers have emerged in the realm of “dynamic NeRF,” which expand upon NeRF to handle the dynamic movement of objects. Nerfies[3] extends NeRF to handle non-rigidly deforming objects by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. They also propose a coarse-to-fine optimization strategy and elastic regularization for the deformation field to improve the robustness of their method.

3D Gaussian Splatting[4] has recently emerged as a compelling method for representing 3D scenes. It offers real-time rendering and faster processing compared to NeRF, with reduced training time. 4D Gaussian Splatting[5] introduces an extension of 3D Gaussian splatting to handle dynamic scenes. Inspired by the HexPlane approach, it proposes a decomposed spatio-temporal neural voxel encoding algorithm to efficiently encode Gaussian features. Subsequently, a lightweight MLP (Multi-Layer Perceptron) is utilized to predict Gaussian deformations at novel timestamps.

References

[1] Newcombe, Richard A., Dieter Fox, and Steven M. Seitz. “Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
[2] Mildenhall, Ben, et al. “Nerf: Representing scenes as neural radiance fields for view synthesis.” Communications of the ACM 65.1 2021.
[3] Park, Keunhong, et al. “Nerfies: Deformable neural radiance fields.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[4] Kerbl, Bernhard, et al. “3d gaussian splatting for real-time radiance field rendering.” ACM Transactions on Graphics 42.4 2023.
[5] Wu, Guanjun, et al. “4d gaussian splatting for real-time dynamic scene rendering.” arXiv preprint arXiv:2310.08528 2023.