Related Work - World-Grounded Human Mesh Recovery from Video

1. Co-evolution of pose and mesh for 3d human body estimation from video.

Key Insight: Human mesh recovery is not just about using 3D keypoints (Pose) or 3D shapes (Mesh) alone—it’s about joint collaboration between them.

Decoupling and co-evolution of pose estimation and mesh prediction:

Poses provide information about human motion; meshes provide information about body shape

2. World-Grounded Human Motion Recovery via Gravity-View Coordinates.

Key Insight: Use a Gravity-View (GV) Coordinate system to infer per-frame human motion, enabling robust result of world-grounded HMR from video.

Aligning with gravity and camera view direction
GV Coordinate system eliminates inconsistencies between coordinate systems of different frames.

3. CameraHMR: Aligning People with Perspective.

Key Insight: Integrate the predicted camera field of view (FoV) into the reconstruction pipeline, improves HMR in monocular images with severe perspective distortion.

HumanFoV: Predicts the FoV directly from the input image.
CamSMPLify: Incorporates the predicted FoV into a full perspective camera model, replacing the traditional weak-perspective assumption.
CameraHMR: Improves the original HMR2.0 architecture by integrating the camera intrinsics predicted by HumanFoV.

[References]

You, Yingxuan, et al. “Co-evolution of pose and mesh for 3d human body estimation from video.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
Shen, Zehong, et al. “World-Grounded Human Motion Recovery via Gravity-View Coordinates.” SIGGRAPH Asia 2024 Conference Papers. 2024.
Patel, Priyanka, and Michael J. Black. “CameraHMR: Aligning People with Perspective.” arXiv preprint arXiv:2411.08128 (2024).