Results - Unsupervised Online Human Detection and Tracking

Summary – We achieve SoTA in MOT20 and 2nd place in MOT17 on the official MOT Challenge Leaderboard

Our failure analysis on SoTA algorithms such as OC-SORT led us to establish that occlusion-based false negatives and ID swaps are the biggest source of error for SORT based trackers. Additionally, we found that naively adding appearance modeling actually reduces performance.

To counteract these we introduce Grid Based Appearance modeling, providing a more fine-grained embedding compared to typical bounding box level descriptions. We further add Dynamic Appearance, Adaptive Weighting, and CMC to OC-SORT. With the the help of all these changes we are able to attain SoTA on MOT20 and 2nd on MOT17 on the official MOT Challenge Leaderboard.

Ablation Study – Grid-based Appearance Modeling and Hard Thresholds

Grid-Based Appearance Model – Results using different grid splits (without post-processing). We find that a 3×1 horizontal split works the best

Hard Thresholds – Results (without post-processing) for different threshold values.

Ablation Study – DA, AW, CMC with bounding box embeddings

Final Results – Deep OC-SORT

Validation – MOT17 and 20

Test Set – Official MOT Challenge Leaderboard

We need to perform additional benchmarking and submissions on DanceTrack, along with extra analysis of our methods and their failure modes, but our goal is to refine and publish our results after the end of our MSCV program.