Multimodal Pedestrian Detection on KAIST
Quantitative comparison on KAIST measured by LAMR↓ in percentage, on the two KAIST test-sets (old and new). We follow the literature that we evaluate in a “reasonable setting” , i.e., ignoring small or occluded persons. Our Bayesian Fusion approach (wtih bounding box fusion) is comparable in Table 1. We take reported numbers from  for most compared methods. Clearly, our Bayesian Fusion approach outperforms the prior methods by a large margin. Bolded numbers marks the best results
Ablation study on KAIST new test-set under the “reasonable” setting, measured by percent LAMR↓. Please see text for a detailed discussion, but overall, we find our proposed BayesFusion approach to outperform all other variants, including end-toend learned approaches such as Early and MidFusion. Fig. 5 shows the corresponding MR-FPPI curves.
Multimodal Object Detection on FLIR
Quantitative comparison on FLIR measured by AP↑ in percentage with IoU>0.5. Following the literature, we evaluate on the three categories annotated by FLIR. Perhaps surprisingly, end-to-end training on thermal images already outperforms all the prior methods, presumably because of better augmentations and a better pre-trained model (Faster-RCNN). Moreover, our fusion methods perform even better. Lastly, our Bayesian Fusion method performs the best. These results are comparable to Table 3.
Breakdown analysis on FLIR day/night scenes (AP↑ in percentage with IoU>0.5). As FLIR does not have day/night tags on the images, we manually annotate them for this analysis. Clearly, incorporating RGB by our learning-based fusion methods notably improves performance on both day and night scenes. We explore late-fusion with detection outputs from our three models: Thermal, Early and Mid. We find all AvgScore, NMS and BayesFusion lead to better performance than the learning-based MidFusion model. Especially, BayesFusion performs the best; using bounding box fusion (bbox) improves further