Introduction

Problem Statement

We plan to develop a real time solution for instance level segmentation of 3D point clouds. We would be using a payload which captures point clouds with a VLP-16 LiDAR and RGB images though 4 fisheye cameras.

Example of Expected 3D Instance Segmentation

We would have to develop a very efficient and lightweight solution as it would be deployed on an “Nvidia Jetson Xavier NX” chip. Another challenge is that we are dealing with highly sparse point clouds and we would need to devise methods which can create more denser point clouds.

Payload on which the solution will be deployed
Details of the Constraints on the payload

Overview of the Larger Goal

The overall larger problem (Multilayer Mapping) that we want to solve at the Airlab is the autonomous navigation in unseen and diverse environments. This is a really challenging task. To achieve this we need to have a rich understanding of the 3D world around us. Thinking purely in terms of geometry is insufficient for solving navigation problems, we also need to know semantic properties of the objects in the environment. If we have such a high level semantic information, the robot can autonomously learn through experience by interacting with the real world. For example, it could learn the differences between movable and non movable objects and understand the various navigational affordances in its surroundings. Such an online learning helps generalize to unseen environments in a better way.

Hierarchical representation of indoor scenes

Apart from the SLAM and accurate 3D reconstruction of our world we also need to have a rich multilayer mapping of the 3D world to aid in better learning. In the bigger picture, we wish to accomplish this mapping with multimodal sensors which would make the solution environment agnostic. Given an indoor scene such as a floor of a building, we plan to have different levels of hierarchy in the way we model the 3D world. We want to simplify the process of map creation by using multiple layers of abstraction. This involves grouping the parts of the 3D world hierarchically. This would go from the basic class level to the room and building level abstraction. We will this will aid in better autonomy and navigation capabilities.

In this capstone project we focus on “semantic” mapping which involves assigning a semantic label to each point in a 3D point cloud. Points corresponding to the same label have the same properties. Given an input point cloud, we must be able to return a dense semantic labelling for each point based on its class.

Multilayer mapping of the 3D indoor scene

References

[1] https://superodometry.com/iccv

[2] Verdoja, Francesco, and Ville Kyrki. “On the potential of smarter multi-layer maps.” arXiv preprint arXiv:2005.11094 (2020).

[3] Wang, Weiyue, Ronald Yu, Qiangui Huang, and Ulrich Neumann. “Sgpn: Similarity group proposal network for 3d point cloud instance segmentation.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2569-2578. 2018.

[4] https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-xavier-nx/

[5] https://superodometry.com/datasets