Conclusion and Future Work

In this work, we study the performance of using Gaussian Splatting [10] (3DGS) for transparent object depth perception.

We propose Clear-Splatting, a method to leverage a strong scene prior to improving depth perception of transparent objects using 3DGS. Clear-Splatting begins by learning background Splats of the entire scene without transparent objects. Following this, residual Splats are trained to complement the background Splats. The results suggest that Clear-Splatting learns a competitive depth reconstruction.

This work could be improved by comparing against more Multi View Synthesis methods non-specific to transparent objects. Future also includes combining Clear-Splatting with recent advances in depth map completion. Future research could explore the performance across different transparent objects and scene conditions.

We also propose ClearSplatting-2.0, a method of robustly using imperfect ‘world’ models to work robustly with transparency. In particular, we integrate Depth Anything V2 model to obtain pseudo ground truth depth maps for depth supervision in two stages of training of the 3DGS model.

Future work for ClearSplatting-2.0 includes extending on real-life dataset and learning a network to better deal with scale shift in monocular depth estimates rather than the 0-1 normalization technique adopted in the current method.

References

[1] J. Ichnowski*, Y. Avigal*, J. Kerr, and K. Goldberg, “Dex-NeRF: Us- ing a neural radiance field to grasp transparent objects,” in Conference on Robot Learning (CoRL), 2020.

[2] X. Chen, H. Zhang, Z. Yu, A. Opipari, and O. C. Jenkins, “Clearpose: Large-scale transparent object dataset and benchmark,” in European Conference on Computer Vision, 2022.

[3] C.Phillips,M.Lecce,andK.Daniilidis,“Seeingglassware:fromedge detection to pose estimation and shape recovery,” 06 2016.

[4] C. Xu, J. Chen, M. Yao, J. Zhou, L. Zhang, and Y. Liu, “6dof pose estimation of transparent object from a single rgb- d image,” Sensors, vol. 20, no. 23, 2020. [Online]. Available: https://www.mdpi.com/1424- 8220/20/23/6790

[5] L. Yang, B. Kang, Z. Huang, X. Xu, J. Feng, and H. Zhao, “Depth anything: Unleashing the power of large-scale unlabeled data,” arXiv preprint arXiv:2401.10891, 2024.

[6] J.L.Scho ̈nbergerandJ.-M.Frahm,“Structure-from-motionrevisited,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[7] B.Mildenhall,P.P.Srinivasan,M.Tancik,J.T.Barron,R.Ramamoor- thi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020.

[8] J. Kerr, L. Fu, H. Huang, Y. Avigal, M. Tancik, J. Ichnowski, A. Kanazawa, and K. Goldberg, “Evo-nerf: Evolving nerf for sequential robot grasping of transparent objects,” in Proceedings of The 6th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, K. Liu, D. Kulic, and J. Ichnowski, Eds., vol.205. PMLR, 14–18 Dec 2023, pp. 353–367. [Online]. Available: https://proceedings.mlr.press/v205/kerr23a.html

[9] B. P. Duisterhof, Y. Mao, S. H. Teng, and J. Ichnowski, “Residual- nerf: Learning residual nerfs for transparent object manipulation,” in ICRA, 2024.

[10] B. Kerbl, G. Kopanas, T. Leimku ̈hler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM Transactions on Graphics, vol. 42, no. 4, July 2023. [Online]. Available: https://repo- sam.inria.fr/fungraph/3d- gaussian- splatting/

[11] T. Mu ̈ller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Trans. Graph., vol. 41, no. 4, pp. 102:1–102:15, Jul. 2022. [Online]. Available: https://doi.org/10.1145/3528223.3530127

[12] C. Reiser, S. Peng, Y. Liao, and A. Geiger, “Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps,” CoRR, vol. abs/2103.13744, 2021. [Online]. Available: https://arxiv.org/abs/2103. 13744

[13] L. Liu, J. Gu, K. Z. Lin, T.-S. Chua, and C. Theobalt, “Neural sparse voxel fields,” NeurIPS, 2020.

[14] A.Yu,R.Li,M.Tancik,H.Li,R.Ng,andA.Kanazawa,“PlenOctrees for real-time rendering of neural radiance fields,” in ICCV, 2021.

[15] C. Sun, M. Sun, and H. Chen, “Direct voxel grid optimization: Super- fast convergence for radiance fields reconstruction,” in CVPR, 2022.

[16] S.J.Garbin,M.Kowalski,M.Johnson,J.Shotton,andJ.Valentin,“Fastnerf: High-fidelity neural rendering at 200fps,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA, USA: IEEE Computer Society, oct 2021, pp. 14326–14335. [Online]. Available: https://doi.ieeecomputersociety. org/10.1109/ICCV48922.2021.01408

[17] S. Lombardi, T. Simon, G. Schwartz, M. Zollhoefer, Y. Sheikh, and J. Saragih, “Mixture of volumetric primitives for efficient neural rendering,” ACM Trans. Graph., vol. 40, no. 4, jul 2021. [Online]. Available: https://doi.org/10.1145/3450626.3459863

[18] M. H. Mubarik, R. Kanungo, T. Zirr, and R. Kumar, “Hardware acceleration of neural graphics,” 2023.

[19] K. Deng, A. Liu, J. Zhu, and D. Ramanan, “Depth-supervised nerf: Fewer views and faster training for free,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, jun 2022, pp. 12872–12881. [Online]. Available: https://doi.ieeecomputersociety. org/10.1109/CVPR52688.2022.01254

[20] B. Attal, E. Laidlaw, A. Gokaslan, C. Kim, C. Richardt, J. Tompkin, and M. O’Toole, “To ̈rf: Time-of-flight radiance fields for dynamic scene view synthesis,” Advances in Neural Information Processing Systems, vol. 34, 2021.

[21] Y. Wei, S. Liu, Y. Rao, W. Zhao, J. Lu, and J. Zhou, “Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo,” in ICCV, 2021.

[22] T. Neff, P. Stadlbauer, M. Parger, A. Kurz, J. H. Mueller, C. R. A. Chaitanya, A. S. Kaplanyan, and M. Steinberger, “DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks,” Computer Graphics Forum, vol. 40, no. 4, 2021. [Online]. Available: https://doi.org/10.1111/cgf.14340

[23] E. Sucar, S. Liu, J. Ortiz, and A. Davison, “iMAP: Implicit mapping and positioning in real-time,” in Proceedings of the International Conference on Computer Vision (ICCV), 2021.

[24] J. Y. Zhang, G. Yang, S. Tulsiani, and D. Ramanan, “NeRS: Neural reflectance surfaces for sparse-view 3d reconstruction in the wild,” in Conference on Neural Information Processing Systems, 2021.

[25] J. Chibane, A. Bansal, V. Lazova, and G. Pons-Moll, “Stereo ra- diance fields (srf): Learning view synthesis from sparse views of novel scenes,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2021.

[26] P. Truong, M.-J. Rakotosaona, F. Manhardt, and F. Tombari, “Sparf: Neural radiance fields from sparse and noisy poses.” IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023.

[27] M.Niemeyer,J.T.Barron,B.Mildenhall,M.S.M.Sajjadi,A.Geiger, and N. Radwan, “Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022.

[28] C.-H. Lin, W.-C. Ma, A. Torralba, and S. Lucey, “Barf: Bundle- adjusting neural radiance fields,” in IEEE International Conference on Computer Vision (ICCV), 2021.

[29] L. Yen-Chen, P. Florence, J. T. Barron, A. Rodriguez, P. Isola, and T.- Y. Lin, “iNeRF: Inverting neural radiance fields for pose estimation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.

[30] Y. Chen, X. Chen, X. Wang, Q. Zhang, Y. Guo, Y. Shan, and F. Wang, “Local-to-global registration for bundle-adjusting neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8264–8273.

[31] Y. Jeong, S. Ahn, C. Choy, A. Anandkumar, M. Cho, and J. Park, “Self-calibrating neural radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 5846–5854.

[32] D. Verbin, P. Hedman, B. Mildenhall, T. Zickler, J. T. Barron, and P. P. Srinivasan, “Ref-NeRF: Structured view-dependent appearance for neural radiance fields,” CVPR, 2022.

[33] E. Xie, W. Wang, W. Wang, P. Sun, H. Xu, D. Liang, and P. Luo, “Segmenting transparent objects in the wild with transformer,” 08 2021, pp. 1194–1200.

[34] Y. R. Wang, Y. Zhao, H. Xu, S. Eppel, A. Aspuru-Guzik, F. Shkurti, and A. Garg, “Mvtrans: Multi-view perception of transparent objects,” 2023.

[35] J. Kerr, L. Fu, H. Huang, Y. Avigal, M. Tancik, J. Ichnowski, A. Kanazawa, and K. Goldberg, “Evo-nerf: Evolving nerf for sequential robot grasping of transparent objects,” in 6th Annual Conference on Robot Learning, 2022.

[36] W.Yifan,F.Serena,S.Wu,C.O ̈ztireli,andO.Sorkine-Hornung,“Differentiable surface splatting for point-based geometry processing,” ACM Transactions on Graphics, vol. 38, no. 6, p. 1–14, Nov. 2019. [Online]. Available: http://dx.doi.org/10.1145/3355089.3356513

[37] G. Wu, T. Yi, J. Fang, L. Xie, X. Zhang, W. Wei, W. Liu, Q. Tian, and W. Xinggang, “4d gaussian splatting for real-time dynamic scene rendering,” arXiv preprint arXiv:2310.08528, 2023.

[38] Z.Yang,X.Gao,W.Zhou,S.Jiao,Y.Zhang,andX.Jin,“Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruc- tion,” arXiv preprint arXiv:2309.13101, 2023.

[39] J. Luiten, G. Kopanas, B. Leibe, and D. Ramanan, “Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis,” in 3DV, 2024.

[40] J. Tang, “Torch-ngp: a pytorch implementation of instant-ngp,” 2022, https://github.com/ashawkey/torch-ngp.

[41] B. O. Community, “Blender – a 3d modelling and rendering package,” 2018. [Online]. Available: http://www.blender.org

[42] Yang L, Kang B, Huang Z, Zhao Z, Xu X, Feng J, Zhao H. Depth Anything V2. arXiv preprint arXiv:2406.09414. 2024 Jun 13.

[43] Agrawal, Aviral, et al. “Clear-Splatting: Learning Residual Gaussian Splats for Transparent Object Manipulation.” RoboNerF: 1st Workshop On Neural Fields In Robotics at ICRA 2024.