Multi-Object Tracking estimates the trajectory of different objects in videos. It is often used in the retail scene to analyze and understand the behaviors of customers to study buying patterns, combat fraud, and estimate in-store traffic..
Our task boils down to multi-person tracking using multiple cameras of different views without post-processing so that the algorithm runs in real-time.
The standard approach in multi-object tracking is tracking-by-detection. From video frames, a set of detections are extracted and then these per-frame detections are associated together by assigning the same ID to the detections that contain the same target, thereby building trajectories.
Phase 1: Spring 2023
We reviewed relevant literature and decided to implement tracking-by-detection method in top-down view in order to efficiently aggregate multiple views. We trained two multi-view detectors (MVDet, MVDeTr) on MMPTrack dataset, and implemented SORT as our baseline tracking algorithm. Detected and tracked bounding boxes in top-down view are reprojected onto camera planes for final output.
Phase 2: Fall 2023
We plan to refine the projected bounding boxes in camera views and suppress false positives by utilizing off-the-shelf single-view detectors (e.g. YOLO). Moreover, we aim to improve the latency of our pipeline to achieve real-time performance. Learning-based tracking algorithms (e.g. DeepSORT, StrongSORT) will be explored to increase the robustness to occlusion and minimize ID switches.