Hierarchical Lidar Panoptic Segmentation
LPS methods in literature employ point cloud backbones to classify points and learn to group points into object instances. These methods require instance-level supervision for all thing classes. In LiPSOW, the other class (for which no instance supervision is available) consists of both stuff and things, and methods should be able to cope with this. To develop a strong baseline for LiPSOW, we draw inspiration from work in LPS, perceptual grouping, and open-set recognition.
Our method, HLPS, employs a point-based encoder-decoder network to classify points into one of K+1 classes, as is the case in open-set recognition. In other words, the network is trained to distinguish the K known classes from other. In the second stage, we run a non-learned clustering algorithm on both things and other points, and learn a scoring function to get an instance segmentation. This is illustrated in Fig 1. Each component of our proposed method is explained below in further detail.
Semantic Segmentation
We use the well-consolidated Kernel-Point Convolution (KPConv) [1] backbone to operate directly on an input point cloud. We attach a semantic classifier on top of the decoder feature representation to output a semantic map which consists of K+1 classes. The network is trained using cross-entropy loss.
Object segmentation via point clustering
We first group points based on their spatial proximity using hierarchical clustering (HDBSCAN), which results in a hierarchy of segments (Fig 2-mid). From this segmentation tree, there exist combinatorially many per-point instance segmentation possibilities. Therefore, to get an instance segmentation from this tree, we need to make a cut through this tree (Fig 2-right).
To generate a cut from this tree, we learn a function which estimates how likely a subset a points represent an object. We use a PointNet classification network trained with a mean-squared error loss function, with an objective to regress the IoU of the segment with its matched ground-truth instance.
Given this function, we need to find where to cut this tree such that an overall segmentation score is as good as possible. In [2], it is shown that if the global segmentation score is defined as the worst objectness in the tree, the worst-case segmentation leads to an optimal cut (which can be obtained efficiently using dynamic programming).
References
- Aygun, Mehmet, et al. “4d panoptic lidar segmentation.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
- Hu, Peiyun, David Held, and Deva Ramanan. “Learning to optimally segment point clouds.” IEEE Robotics and Automation Letters 5.2 (2020): 875-882.