Simple Composition of Two Scenes
The first goal of our project is to compose an object from one scene into another. To achieve this, we load a static model trained on one scene and a dynamic model trained on a different scene, then run Dynamic NeRF inference to get the composed image as shown in the figure below.
Transformation could also be applied to change the position of dynamic object:
Masked Dynamic Model
In the baseline, the dynamic model learns dynamic objects and their surroundings. We use a mask based on ground truth segmentation when training the dynamic model to only learn the dynamic objects.
As we can see in this composed umbrella scene comparison, our masked version will have a more realistic color of the foreground object (umbrella) than the baseline. We infer this is due to the better blending factor prediction that reduces the blue color from the static model (people in the background).
Novel View Mask Loss
The prediction of the blending factor will often fail on the edges of a frame since this part of the novel view is not present in the training frame at the same timestep. This will cause the dynamic model to render several foreground objects and artifacts.
Our solution is to use “novel view mask loss” for blending factor prediction at a novel view. Given the camera parameters, the object mask, and the depth map at one viewpoint, we can render the object mask at another given viewpoint. This solution is summarized in the figure below.
Multiple Dynamic Objects
We also explore training Dynamic NeRF on custom data that contains multiple dynamic objects moving incoherently, like the following traffic scene. After training and rendering, we see that Dynamic NeRF has many parts of the background moving, even though we explicitly pass it the object masks with cars as the dynamic objects.