Sunday, March 31, 2019

AutoFace - How It Works

The core of AutoFace is the 3DMM face identity shape network described in

A. Tran, T. Hassner, I. Masi, G. Medioni, "Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network", in CVPR, 2017. The preprint version is available as arXiv:1612.04904 [cs.CV].

Code for this has generously been made available at Github, https://github.com/fengju514/Expression-Net.

The Expression-Net script analyzes a single input image and outputs a 3D shape in the form of weights for a set of 99 morphs labelled 00 to 98. The location of a mesh vertex is








The Expression-Net script also describes the 99 morphs, but only for a modified Basel Face Model (BFM), which appears to be popular among AI researchers. The BFM is a high-density face mesh, and as such rather unusable for artists, who prefer a medium-density full body mesh, which is rigged and textured. A key contribution of AutoFace is to translate the morphs from the modified BFM into shapekeys for two useful meshes, the Genesis 8 Male and Female characters used in DAZ Studio.

The pictures below show the first few morphs, for the modified BFM, Genesis 8 Male, and Genesis 8 Female. First the original meshes with no shapekeys applied.

Basic shapes - unmorphed meshes


Shapekey 00
Shapekey 01
Shapekey 02
Shapekey 03
Shapekey 04
Shapekey 05
Shapekey 06
A number of comments are in order.
  • The shapekeys in AutoFace are ten times stronger than the original ones, and hence the coefficients must be reduced by the same factor of ten. The reason is that higher shapekeys are quite subtle and difficult to detect.
  • The Genesis shapekeys taper off near the boundary of the BFM, in order to avoid a sharp transition to the rest of the head mesh. In particular a fattened jawline, such as the one in shapekey 00, is not transferred correctly. This problem may be affecting the example with president Trump.
  • They eyes, including the skin behind the eyes, are not morphed. Instead the eye vertices are scaled and translated based on the movement of the corners of the eyes. This ensures that the eyes remain round, which is desirable when posing.
  • Similarly, the inside of the mouth is scaled based on the movement of the corners of the mouth, in order to ensure that the jaws and teeth remain inside the face.
  • The shapekeys shown in this post are more or less symmetric. Higher shapekeys are less pronounced, and some of them are asymmetric.
As always, something is lost in translation, so it would be better if the deep network were trained directly for the Genesis 8 meshes. Or for subdivided versions of them. They high-frequency data could then be converted into a normal map. Alas, I neither have the competence, computer resources nor the training data to do this.