Motivation – Object Permanence

Monocular object detection and tracking have improved drastically in recent years, but rely on a key assumption: that objects are visible to the camera. For this reason, either they are defined in a single-frame setting or an offline setting, respectively. However, amodal (partially or fully invisible) object detection in embodied robotic agents fundamentally requires object permanence, which is the ability to temporally reason about occluded objects before they re-appear. In this work, we propose TAO-Amodal, the largest amodal object detection and tracking dataset consisting of 3000+ videos, 800+ object categories, and 700k+ amodal bounding boxes. We treat amodal object detection as an online tracking problem and design a spatiotemporal (online) tracking network that reasons about occlusions in 2.1D, to detect a large vocabulary of objects amodally, even when they are fully-occluded.
Problem Formulation

In this project, we aim to design a supervised spatiotemporal network that tracks and detects objects amodally through complete occlusions in 2.1D. Given a set of previous images frames and current frame, our proposed amodal detection framework predicts bounding boxes for
1. fully visible objects (cup)
2. partially occluded objects (baby)
3. fully occluded objects (the toy occluded by the cup)