Quantitative Results

Ablation on approach. We perform an ablation study to evaluate the impact of both the optimization strategy and camera parameterization on performance. We compare random sampling, CMA-ES, and our gradient-based ascent method across all tested rotation representations.

Random sampling provides an unbiased baseline for probing sensitivity to camera perturbations, while CMA-ES improves search efficiency through adaptive covariance estimation. Our gradient-based approach consistently outperforms both baselines, converging faster and identifying more severe failure cases across all parameterizations.

These results highlight the importance of jointly selecting an effective optimization method and a stable camera representation when evaluating depth model robustness.

Ablation on model. Table below reports the average performance of all evaluated models across scenes, summarizing their robustness under adversarial camera perturbations. We report both $\delta_1$ and AbsRel, where higher AbsRel and lower $\delta_1$ indicate greater discrepancy between predictions from adversarial viewpoints and ground truth depth. Overall, more recent models, particularly DepthAnything V2, exhibit stronger robustness, achieving higher $\delta_1$ and lower AbsRel on average compared to older architectures such as MiDaS. Models benefiting from large-scale synthetic pretraining or teacher-based supervision consistently outperform earlier methods, highlighting the impact of modern training strategies on depth stability under viewpoint shifts.

Qualitative Results

Below we show some failure modes discovered by our model.

Camera Trajectories. We also show some examples of the smoothness of trajectories under the R6 parameterization as opposed to other ones below.

Discussion

Stochasticity of Trajectories

One issue is resolving degenerate solutions and trajectory stability. Specifically, we’ve found that trajectories initialized from the same starting point do not resolve well even by seeding PyTorch3D. We suspect that it’s the result of two issues.

Camera Clipping: We observe that degenerate solutions, such as camera clipping through geometry, often lead to large spikes in error and hinder convergence. These artifacts introduce instability and result in suboptimal optimization trajectories, particularly when the camera gets too close to or intersects with scene surfaces.
Loss Surface Instability: Additionally, we suspect that the instability of the optimization process is exacerbated by the fact that the ground truth signal changes at every frame, due to the adversarial nature of the supervision. This leads to a non-stationary loss surface, making it harder for the optimizer to converge consistently. To address this, and inspired by prior work in PyTorch3D, we plan to explore the use of Soft Rasterization to produce smoother gradients and reduce the volatility of the loss surface. This may help guide the optimization more reliably, especially in challenging regions of the parameter space.