Introduction

Motivation:

A robust 2D to 3D face reconstruction from a single image holds promise for applications in human-computer interaction, security, animation, and health. In particular, 3D face reconstruction can be hugely beneficial in improving pose and illumination-invariant face recognition and avatar generation for VR applications.

Faithfully recovering the 3D shapes of human faces from unconstrained 2D images is a challenging task and involves using a Neural Network model with a lot of prior knowledge. There can be Volumetric and Mesh-based 3D representations for a face of a person and we will be focusing on Mesh-based representations as they are easier to manipulate. Mesh-based representations involve generating a texture and a mesh.

Problem Statement:

Several current monocular 3D face reconstruction techniques can reasonably recover fine geometric details. However, they suffer from limitations such as being unable to produce texture maps that capture high-frequency detail sufficiently or generating faces that do not model well non-static details like wrinkles that vary with expression.

To address these limitations, we propose a 3D face generative model that generates high-quality albedo and precise 3D shapes by leveraging StyleGAN2, resulting in a photo-realistic rendered image. Our method uses alternate descent optimization in a self-supervised manner to bring Style-GAN2 samples to 3D using a differentiable renderer. In particular, we propose AlbedoGAN, a model to generate albedo with the same corresponding latent space. Our framework produces a detailed 3D mesh, pose, expression, and lighting by leveraging a face recognition backbone and a detailed shape estimator. The model estimates a detailed 3D face mesh and preserves the identity of the rendered faces better than existing techniques. Our framework inherits the benefits of 2D face generative models such as StyleGAN, and we bring semantic face manipulation into 3D. We demonstrate direct control of expressions in 3D faces by manipulating latent codes and text-based editing of 3D faces.

Sample 3D meshes generated by our model