Towards Universal 3D Lifting

Related Work

3D-LFM: Lifting Foundation Model

Utilizes Graph Transformer with Procrustean alignment to learn non-rigid deformations
Robust against order and number of input keypoints – displays permutation equivariance
Shows OOD generalization on unseen categories

Unsupervised Keypoints from Pretrained Diffusion Models

Detects semantically meaningful 2D key points in an unsupervised way
Uses emergent knowledge within pretrained Stable Diffusion model
A randomized text embedding is optimized with the diffusion model to learn to attend to the relevant keypoints in the image

MotionBERT: A Unified Perspective on Learning Human Motion Representations

Motion encoder learns human motion patterns using pretraining on noisy, occluded inputs
Uses dual-stage spatio temporal attention blocks
Finetuned MLP for downstream tasks – pose estimation, mesh reconstruction, activity recognition

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

A feature extractor + vision transformer backbone with seperate MLP heads to predict camera parameters, pose and shape from an input image
Instead of regressing SMPL pose, it predicts a pose token class which is then reconstructed into a continuous pose using the pretrained codebook
A VQ-VAE is pretrained to act as a tokenizer codebook where SMPL poses are encoded into discrete tokens and reconstructed by the decoder

Towards Universal 3D Lifting

Proudly powered by WordPress