Project Overview

Semantic Facial Image Manipulation is a conditional generation task. The goal is to synthesize a facial image conditioned on both identity and target expression (In our settings, identity information is provided as a 2D image and target expression is provided as a FACS code).

Figure 1. Facial Action Coding System (FACS) is a standard representation of facial expression in behavior science [2]. It breaks down facial expressions into individual components of muscle movement, called Action Units. Each expression is encoded as a vector. Each number in the vector represents one AU and the magnitude of the number represents its intensity. This way of encoding facial expressions is identity agnostic and interpretable.

Social Motivation

Fujitsu is interested in using FACS-based expression analysis to analyze viewer’s behavior for advertising and to provide feedbacks for video conferences, remote training, etc. But FACS annotated data is limited in amount and often suffer from skewed distribution. Some expressions and Action Units are much more frequent than others. Using facial expression manipulation can generate FACS annotated data while controlling the distribution of generated data (i.e. we can generate expressions and Action Units we want).

Technical Motivation

2D based methods like GANimation [1] manipulates faces in 2D image space, which often results in artifacts in generated images and failures under extreme head pose and lighting condition.

Proposed Pipeline

We proposed an encode-manipulate-decode pipeline mainly consisting of two modules, geometric manipulation module and semantic manipulation module. The geometric manipulation module warps the feature maps of input image guided by estimated 3D geometry. In this way, artifacts are reduced by moving the manipulation from image space to feature space. The 3D geometry introduced here helps handle extreme head pose and lightning condition. The warped feature maps are then fed to semantic manipulation module which synthesizes information that was not the input image

Figure 2. An illustration of our proposed pipeline. It mainly consists of two parts, geometric manipulation module and semantic manipulation module


[1]Pumarola et al. GANimation: Anatomically-aware Facial Animation from a Single Image, ECCV2018
[2]Ekman, Paul, Wallace V. Friesen, and Joseph C. Hager. “Facial action coding system: The manual on CD ROM.” A Human Face, Salt Lake City (2002): 77-254


Zhuoqian Yang

Zhuoqian is an MSCV student of class Fall 2019. He received his bachelor’s degree in Software Engineering from Beihang University. His research interest is in creative computer vision, content creation and manipulation with generative models and unsupervised/self-supervised learning.

Responsibilities: Data processing, pipeline construction, experimentations and analysis

Dai Li

Dai is an MSCV student of class Fall 2019. She received her bachelor’s degree in Department of Automation from Tsinghua University. Her research focuses on object detection, semantic segmentation and network interpretability.

Responsibilities: Experimentations, qualitative & quantitative evaluation and analysis

Advisor: Laszlo Jeni

Laszlo is a Systems Scientist (faculty) in the Robotics Institute at Carnegie-Mellon University. I focus on advancing the state-of-the-art in multi-modal methods for computational behavior science, specifically in areas of modelling, analysis, and synthesis of human behavior and emotion using diverse sensors.