Abstract - 4D Scene Reconstruction using Casual Capturing Monocular Camera

This study introduces a novel approach to 3D reconstruction of non-rigid scenes using low-rank factorization and neural radiance. Our method addresses the challenges inherent in capturing the dynamic nature of deformable objects, providing fast and robust results. We leverage tensor decomposition techniques, specifically the Canonical Decomposition/Parallel Factors (CP) method, to efficiently process and reconstruct complex scenes.

Introduction

3D reconstruction of non-rigid scenes is a challenging yet vital task in the field of computer vision. The accurate depiction of objects exhibiting shape variability, such as those in motion or under transformation, is crucial for various applications. Traditional methods often struggle with the complexity and dynamism of such scenes. Our work is driven by the need for efficient algorithms capable of managing deformable objects and synthesizing new views from sparse image sets captured by monocular cameras.

Method

Our approach utilizes a combination of low-rank factorization and neural radiance fields (NeRF) for efficient 3D scene reconstruction. We employ tensor decomposition techniques, specifically CP decomposition, to break down complex deformations into manageable components. This process involves integrating explicit and implicit representations of the scene, allowing for effective handling of non-rigid movements. The decomposition facilitates the reconstruction from a sparse set of images, overcoming limitations of traditional methods.

so far, we used HexPlane Representation (as described in HexPlane paper ) in our deformation field:

It leverages an explicit representation with six feature planes, each spanning a pair of coordinate axes(x,y,z) at arbitrary time. This architecture computes feature vectors for spacetime points by projecting them onto these planes and aggregating the resultant vectors. These features are then fed into a tiny MLP to map (x’,y’,z’), then pass it to Nerf template to predict color and density.

Experiment

We conducted experiments using a monocular camera setup to capture dynamic scenes. The data was processed using our proposed method, where tensor decomposition played a crucial role in managing the deformability of the objects. We compared our results with existing works, Hex-Plane (explicit representation) and Nerfies (implicit representation), to evaluate the efficiency and accuracy of our method. Our approach(explicit and implicit representation) increases speed without losing quality.

Conclusion

The experimental results demonstrate that our approach significantly improves the efficiency and robustness of 3D reconstruction in non-rigid scenes. By leveraging low-rank factorization and neural radiance, we successfully address the challenges posed by deformable objects. This method shows promise for various applications in computer vision where accurate and dynamic 3D reconstruction is required. Future work may explore further optimizations and applications of this technique in more complex scenarios.

This was done by Asrar Alruwayqi, the MSCV student. Under the advice of prof.Shubham.

BIO

I am currently pursuing a Master’s degree in Computer Vision at a well-known Robotics Institute. Previously, I completed a Bachelor’s degree in Computer Science. Afterward, I worked as a Research Engineer at the National AI Center in Saudi Arabia, where I gained valuable experience and mentorship in the field. My ardor for computer vision is deeply rooted, with a special emphasis on 3D reconstruction, and computational geometry.