CS194-26 Project 4: Facial Keypoint Detection with Neural Networks

Kevin Shi

Overview

This project is one where I use pytorch and CNNs to detect facial keypoints.

Part 1: Nose Tip Detection

We start with a toy example of nose point detection. This initial dataset is the IMM Face Database, which is based off faces of Danish computer scientists with included 240 pictures.

Our NN design for this toy example is very simple, 4 convolution layers with a pool after each layer and a final FC layer paired with MSELoss and Adam optimizer. However, even with its simplicity, it performs pretty well. In the following example, the red dot is the ground truth, and blue is the predicted point.

There are some cases where it fails, such as sample 13 and 18, with the nose on the mouth and side of the cheek respectively. Both images fail because of the tilt of the head. The first is tilted up, and the second is tilted to the side. We have more straight onlooking images, as well as the straight onlooking images being more consistent between different people, skewing our training data.

The following table is the MSELoss curve for validation and test data while training.

Part 2: Full Facial Keypoints Detection

We now move on from the toy example to pursue the full problem: all of the facial keypoints.

Training with purely the given dataset is simply not enought training data. Thus we augment our existing data with some random affine transformation as well as saturation and brightness transforms to simulate more faces and environments as well as make our NN more robust

The structure of the CNN here is also very simple, with 6 CNN layers and 3 FC layers. We have a learning rate of .004 and beta values of .9 and .999. This structure performs not too bad with a very short training time.

The NN does very well on front facing images, even ones at an angle, but once again has problems with faces looking to the side. This is again due to the data problem.

Let's see the training curve again for this network:

Part 3: Train With Larger Dataset

For this final part of this project, we test with a huge dataset of 6666 images. I used the same hyperparameters as before with a ResNet18 architecture , as well as the same transformations as before.

Looks like we learned a very general face shape. We can see that all the dots are relatively in similar positions no matter the face. This causes sideways faces to fail again. I believe we have the same face problem.

Also our training curve: