How do we infer 3D Face from 2D Landmarks Efficiently?

Given a 2D video stream of a face with detected 2D landmarks we are developing a real-time system that can accurately fit a 3D face model to the 2D landmarks, enabling the estimation of the 3D face shape, pose and expression

3D Fitting from sparse 2D landmarks only

This can help us bypass the costly photometric optimization that plagues current fitting methods

Leverage face 3D morphable models

This will provide us a strong 3D prior for estimating shape efficiently

Minimize bias in 3D face inference

The method should generalize across faces of different ethnicity, age and sex

Motivation

The ability to obtain 3D face model from cheaply available sparse 2D landmarks can unlock other downstream applications on low-compute devices like smartphones

On-device Virtual Avatars

  • Virtual avatars adds a dash of fun to video calls
  • The avatars can be directly driven in real-time by the user’s expression

Virtual Try-on

  • Trying on make-up or a pair of glasses virtually requires real-time estimation of face pose and shape
  • A lightweight face fitting algorithm can keep the user’s face data secure on their on own device

Ensuring fairness and accuracy in these applications requires addressing inherent biases in 3D modeling frameworks and attribute classification systems. By tackling these challenges, we can enable immersive applications across diverse user groups, making advanced facial modeling accessible and reliable for all.

Existing detectors and 3DMMs are biased

  • Current 2D facial landmark detection methods and frameworks like FLAME struggle to accurately predict and project landmarks on diverse facial datasets, especially in real-world (“in-the-wild”) scenarios.
  • These challenges stem from limited generalization to variations in ethnic facial structures, poses, lighting, and occlusions.
  • This highlights the need for more inclusive datasets and robust algorithms to ensure accurate and unbiased facial analysis across diverse populations