A high definition (HD) map is an important component in the autonomous driving industry since it provides a lot of information for path planning. However, the process of updating the information can be costly and time-consuming since there are different kinds of information need to be updated such as traffic signs, construction areas, and closed roads. Consequently, this project aims to solve this problem by coming up with a cheap and crowdsourced method that can automatically update the information.
Our system pipeline is divided into two parts. One is building the 3D model and the database, and the other is updating the database.
Build 3D Model and Database
Given the argoverse images and associated GPS, we can build a 3D point cloud model with SfM system. The traffic sign detection network is used to detect the bounding boxes of traffic signs in the images. In addition, because we have accurate camera poses from Argoverse HD map, integrating 3D model, camera poses and traffic sign bounding boxes, we can localize traffic signs in the HD map.
Given the new images which may come from mobile devices or public transportation vehicles, again we use the traffic sign detection network to detect the traffic signs in the images. To localize the new cameras, we solve a PnP problem with the pre-built 3D model. After comparing the location of newly detected traffic signs and the ones stored in the database, we update the traffic sign database accordingly.
Traffic Sign Detection
We currently focus on traffic sign detection and the following figure shows the trained model tested on the argoverse dataset image. We plan to extend the detection module to a more general detector next semester.
Because there is no traffic sign labels in argoverse HD map, we use Mapillary Traffic Sign Dataset to train the traffic sign detection network. However, the number of interested traffic signs in Mapillary dataset is not enough for our usage, so we build a synthetic traffic sign dataset by arbitrarily pasting random traffic signs on the images. Some example images of the synthetic dataset are as following .
Focal Loss + Small Object Detection
One issue of traffic sign detection is that the interested objects are small. Most of the traffic signs are only 2% in width of the image. Another issue is that, the foreground objects (traffic sign) are sparse in the image. There may be too many background objects compared to the foreground objects during training. To solve these two issues, we adjust the anchor sizes by calculate the statistics of the size of the traffic signs, and adopt focal loss to balance the foreground/background ratio of the training objects. The performance of the final model improve nearly 15% in mAP over our baseline model.
Following we show the per-class mAP of different models
Traffic Sign Localization
We use COLMAP to build 3D models of the street. Below is a visualization. The points are the sparse COLMAP reconstruction of the street, and the red rectangles are estimated camera poses.
Structure from Motion
Below is a visualization of traffic sign localization results. On the left is a video showing the detection bounding boxes, and one the right is a video showing the bounding boxes projected onto the HD map. A new traffic sign is added into our database only if there are no existing traffic signs of the same category in its close adjacency.
Updated traffic sign detection video
Updated traffic sign projection video
Below is the visualization of our GIS database. Here each red line is a Colmap model, and the blue lines are the coverage of the whole HD map. We have built the traffic sign databases for red line regions, and we will expand our coverage to the whole HD map in the future.