Experiment Results - Object Detection in Infrared Images

Multimodal Pedestrian Detection on KAIST

Quantitative Compomparison

Quantitative comparison on KAIST measured by LAMR↓ in percentage, on the two KAIST test-sets (old and new). We follow the literature that we evaluate in a “reasonable setting” [1], i.e., ignoring small or occluded persons. Our Bayesian Fusion approach (wtih bounding box fusion) is comparable in Table 1. We take reported numbers from [1] for most compared methods. Clearly, our Bayesian Fusion approach outperforms the prior methods by a large margin. Bolded numbers marks the best results

Ablation Study

Ablation study on KAIST new test-set under the “reasonable” setting, measured by percent LAMR↓. Please see text for a detailed discussion, but overall, we find our proposed BayesFusion approach to outperform all other variants, including end-toend learned approaches such as Early and MidFusion. Fig. 5 shows the corresponding MR-FPPI curves.

Quantitative Comparison

Qualitative results on three random testing examples in KAIST. Top: over RGB images, we overlay the detection results from our mid-fusion model. Bottom: on the thermal images, we show results from our best-performing Bayesian Fusion model. Green, red and blue boxes stand for true positives, false negative (mis-detected persons) and false positives. Visually, our Bayesian Fusion performs much better than the mid-fusion model

Multimodal Object Detection on FLIR

Quantitative comparison

Quantitative comparison on FLIR measured by AP↑ in percentage with IoU>0.5. Following the literature, we evaluate on the three categories annotated by FLIR. Perhaps surprisingly, end-to-end training on thermal images already outperforms all the prior methods, presumably because of better augmentations and a better pre-trained model (Faster-RCNN). Moreover, our fusion methods perform even better. Lastly, our Bayesian Fusion method performs the best. These results are comparable to Table 3.

Ablation Study

Breakdown analysis on FLIR day/night scenes (AP↑ in percentage with IoU>0.5). As FLIR does not have day/night tags on the images, we manually annotate them for this analysis. Clearly, incorporating RGB by our learning-based fusion methods notably improves performance on both day and night scenes. We explore late-fusion with detection outputs from our three models: Thermal, Early and Mid. We find all AvgScore, NMS and BayesFusion lead to better performance than the learning-based MidFusion model. Especially, BayesFusion performs the best; using bounding box fusion (bbox) improves further