Given an in-the-wild video, our task is to reconstruct the world coordinate 3D human motion: local body motion and global trajectory.

Problem Definition: 1 frame in, 1 frame results out, no later frames for current estimation.
Related Works: 1) Prior works on camera coordinate HMR, inefficient (16 in 1out), require future frames. 2) Prior works on human-centric videos SLAM, using later frames for optimization, are slow.

(Image Credit: MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors, Murai et al.)
