Overview
Given a low-resolution 3D head avatar driven by a morphable model, our pipeline operates in three main stages. We first reconstruct a static 3D head in the canonical space with multi-view 3D GAN inversion. We then refine mesh geometry and rig 3D Gaussians onto the mesh surface to enable animation. Finally, we include anchor images with diverse camera poses and expressions for dynamics-aware 3D refinement, ensuring the robustness of the 3D head model across viewing angles and complex facial motions.

Key Components
To achieve this, we break down the process into three core technical modules:
1. Multi-View 3D Inversion (Canonical Reconstruction) The first step is to hallucinate missing high-frequency details from the low-quality input.
The Process: We utilize a pre-trained 3D GAN to generate a static, high-resolution 3D Gaussian head. By optimizing the latent code based on upscaled multi-view renderings, we ensure the reconstructed head possesses photorealistic textures and geometry.
Why it matters: This establishes a high-fidelity “base model” that far exceeds the quality of the original blurry input.
2. 3D Gaussian Rigging & Geometry Refinement A high-quality static head must be rigged correctly to move convincingly.
Geometry Refinement: Low-resolution inputs often suffer from misalignment (e.g., teeth not aligning with lips). We refine the underlying FLAME mesh geometry to strictly align with facial landmarks before binding.
Rigging: The optimized 3D Gaussians are then bound to this refined mesh, allowing the detailed textures to follow the face’s movement naturally.
3. Dynamics-Aware 3D Refinement Standard inversion often fails when the face deforms into extreme expressions.
Multi-Expression Anchors: We sample “anchor images” containing diverse expressions (such as open mouths or squints) to capture occluded regions like teeth and eyelids.
Joint Optimization: We jointly optimize the model using these dynamic anchors. This ensures that the super-resolved avatar maintains its identity and geometric consistency not just in a neutral pose, but across all complex facial motions.
