Solution - Depth Estimation in Low-light Environments for Autonomous Navigation

Our Datasets

KAIST Multi-Spectral Dataset

The KAIST Dataset has multiple sequences of pairs of RGB and Thermal images. Initially built as a pedestrian detection benchmark, this dataset is one of the very few publicly available datasets to have RGB and Thermal together. Moreover, this data has images from all day and different weather conditions making it suitable for our experimentation.

**KAIST Dataset.** Each frame contains one RGB and one Thermal Image

**NREC Dataset**. Each frame contains an RGB camera, two near-infrared vertical cameras for stereo, and a thermal camera. The streams are all synchronized and rectified.

NREC Collected Data
The NREC dataset contains not just low-light but also off-road environments for which there aren’t any publicly available datasets yet. As such, NREC has collected its own data at various locations that are closer to our actual domain. An example of one such location during daytime is shown. We seek to train our models on daytime data and evaluate them in both day and night conditions. Note, however, we do not have any ground truth depth. As such we rely on creating pseudo ground truth from the vertical stereo pair.

Pseudo Ground Truth

Results Overview

We use the Monodepth2 architecture to train our models. In addition to the re-projection loss, we also investigate adding extra supervision with the pseudo-ground truth for RGB and thermal models. All of our results can be seen below.

RGB-Based Depth Estimation

Thermal-Based Depth Estimation

**Point Cloud Output for MonoThermal**. We visualize the disparity map as a point cloud by de-projecting the pixels in the RGB image based on the intrinsics and showcase the full 3-D point cloud. Our model is able to clearly detect the person and their depth relative to the rest of the scene. We, however, notice some issues with the trees and the sky which are likely artifacts from the pseudo-ground truth.

Quantitative Analysis

**Quantitative Results.** We calculate accuracy based on the relative error between the predicted and ground truth disparity and measuring them within different thresholds.

We train Monodepth2 with both RGB and thermal with both self-supervision and stereo pseudo-ground truth. We notice that the thermal model and the RGB model have a large difference in the self-supervised case for the strictest delta threshold. However, when we add the pseudo ground-truth, we are able to close the gap between RGB and thermal almost entirely and achieve really good performance. Since thermal images have less of a domain shift from day to night, our model is likely able to perform the same in day and night, meaning our model is nearly as good as if the RGB model was running on day time!