Preliminary Experiments

Random Sampling

Our preliminary results will cover some of the results from random sampling on both the 3D-FRONT dataset as well as our own. The 𝛿₁ and AbsRel metrics are evaluated for Depth Anything v2 across different poses, with the graph displaying the mean and standard deviation at each pose under randomly sampled rotations. Notably, the 𝛿₁ score drops by approximately 25% relative to its average value, while the AbsRel score experiences a similar decline of about 29%, indicating a significant degradation in depth estimation performance with rotational variation.

A Qualitative Look

At different rotations, both foreground and background regions exhibit poor depth estimation, with reflective and transparent surfaces in particular being abruptly mispredicted. This suggests that view-dependent effects such as reflections and lighting play a significant role in these failure modes.

Stochasticity of Trajectories

One issue is resolving degenerate solutions and trajectory stability. Specifically, we’ve found that trajectories initialized from the same starting point do not resolve well even by seeding PyTorch3D. We suspect that it’s the result of two issues.

Camera Clipping: We observe that degenerate solutions, such as camera clipping through geometry, often lead to large spikes in error and hinder convergence. These artifacts introduce instability and result in suboptimal optimization trajectories, particularly when the camera gets too close to or intersects with scene surfaces.
Loss Surface Instability: Additionally, we suspect that the instability of the optimization process is exacerbated by the fact that the ground truth signal changes at every frame, due to the adversarial nature of the supervision. This leads to a non-stationary loss surface, making it harder for the optimizer to converge consistently. To address this, and inspired by prior work in PyTorch3D, we plan to explore the use of Soft Rasterization to produce smoother gradients and reduce the volatility of the loss surface. This may help guide the optimization more reliably, especially in challenging regions of the parameter space.