CS 194-26 Project 4 - Facial Keypoint Detection with Neural Networks

Andy Wu, Fall 2020

Part 1: Nose Tip Detection

Using a small neural network, we tried to automatically detect the nose tip from a given facial image.

Sampled Images

The original images are

Training

The training and validation loss:

Output

In the following images, green is the ground truth and red is the output of the neural network.
In these two images, the nose is detected properly

while in these two, the nose is not detected well.

Interestingly, the nose of the woman head on was not detected properly, while the nose of the man that was turned was detected properly.

Part 2: Full Facial Keypoints Detection

We then enlarged the neural network and tried to detect all facial keypoints from the images.

Sampled Images

The original images are

Training

The neural network was an extended version of the one I used in the previous part. It had 5 convolutional layers, each with its own pooling and ReLU layers, along with 2 FC layers at the end. I stuck with a learning rate of 1e-3 and trained it for 30 epochs.
The training and validation loss:

Output

In the following images, green is the ground truth and red is the output of the neural network.
In these two images, the facial keypoints are detected properly

while in these two, the keypoints are not detected well.

The neural network seems to work much better on faces that are facing directly towards the camera. Also, high contrast in the facial features seemed to help the neural network accuracy.

Filters

I used many small filters, which can be visualised below. It is hard to come up with any conclusions from viewing the filters.

Part 3: Train with Larger Dataset

I achieved a score of 22.37 on the Kaggle.

Training

I used the ResNet-18 model to train. I replaced the first convolution layer with one that only took in a single channel input. All other parameters of the convolution were kept the same. I also replaced the FC layer at the end with one that output 136 values, for the 136 x, y coordinates for the keypoints. During training I used a learning rate of 3e-4. The training and validation graphs seem to show that I could have gotten better results if I trained for longer, but I was running into problems with Colab disconnecting me from the instance, so I stopped at 10 epochs.
The training and validation loss:

Output

In these two images, the facial keypoints are detected properly

while in these two, they are not detected well.

Some of the bad outputs seem to come from bad bounding boxes. Even after I expanded the boundary of the box, the face would still not always be within the cropped image.
I also tried the network on some of my own images, but they did not turn out as well. I think it might have to do with tuning the bounding box more to remove more of the background before feeding the image into the neural network to get a more accurate result.