Facial Keypoint Detection with Neural Networks

Allan Yu

Overview

We use neural networks to attempt to identify facial keypoints.

Nose Tip Detection

Here are nose tip points manually chosen: ground-truth keypoints.

Nose Neural Network

I create a simple neural network consisting of three convolution layers and two linear layers (as well as relu's and pools) to predict the nose location. We split into training and validation sets, and train the model on the training set. I evaluate loss on the training set every iteration. Every epoch, I also evaluate loss on the validation set. Here is my model trained on 14 epochs.

Results

Here are results from my network. Red point is ground truth, blue point is my prediction. The neural net did not perform so well on the turned faces, likely because the faces were turned, adding unrecognized variance to the picture. This was something my neural net did not recognize.

Full Facial Keypoints Detection

We now attempt to predict all keypoints rather than just the nose. Again, here are some ground truth keypoints.

Neural Network

Here, I perform data augmentation so my net does not overfit; to do this, I added random jitter to the photographs. Doing this combined with a good model halved my MSE loss from initial attempts.

My model consists of five convolution layers: (1, 6, 5), (6, 12, 5), (12, 12, 3), (12, 24, 3), (24, 24, 3); two linear layers: (576, 256), (256, 116); a relu and maxpool following all convolutions except the fourth (just a relu with no maxpool there). Here is my training and validation loss, recorded the same way as before. We achieve MSE loss of approximately 100.

Results

Here are results from my network. Red points are ground truth, blue points are my prediction. The neural net performed not so well on some turned faces, likely because the faces were turned, adding unrecognized variance to the picture. This was something my neural net could not handle very well.

Finally, here are some sample visualizations of learned filters.

Convolution 1

Also Convolution 1

Convolution 2

Convolution 3

References (Danes Data Set)

M. B. Stegmann, B. K. Ersb¿ll, and R. Larsen. FAME { a °exible appearance modelling environment. IEEE Trans. on Medical Imaging, 22(10):1319{1331, 2003