CS 194-26: Computational Photography, Fall 2020

Project 4: Facial Keypoint Detection with Neural Networks

Michelle Fong, CS194-26-ael



Overview

In this project, I used neural networks to automatically detect facial keypoints. This involved using PyTorch to create a convolutional neural network and training and testing it on data from the IMM Face Database.

Part 1: Nose Tip Detection

This first part involved making an initial toy model for nose tip detection. To do this, I first parsed the landmarks from the dataset and stored it into a dataset object. I called the PyTorch Dataloader on that dataset, splitting it into test and validation sets. Below is some samples of the annotated images.

Failure #1
Failure #2
Failure #1
Failure #2

Then I created my neural net structure is as below. I followed the guidelines in the spec, choosing to keep the output dimensions of all my convolutional layers as 12 and kernel size as 3. I used a total of 25 epochs with a learning rate of .001 and a batch size of 1.

Net(
(conv1): Conv2d(1, 12, kernel_size=(3, 3), stride=(1, 1))
(conv2): Conv2d(12, 12, kernel_size=(3, 3), stride=(1, 1))
(conv3): Conv2d(12, 12, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear(in_features=480, out_features=260, bias=True)
(fc2): Linear(in_features=260, out_features=2, bias=True)
)

Here is a plot of the MSE for training and validation sets, over 25 epochs.

MSE Loss
Success #1
Success #2

These are examples of my neural net correctly identifying the nosetip point, with the ground truth point in red and the prediction in blue.

Failure #1
Failure #2

These are failure cases where the neural net does not correctly identify the nosetip point. I believe this is due to the faces being tilted and the facial features being warped as a result.

Part 2: Full Facial Keypoints Detection

In this part, I applied the same logic to bigger images and to predict multiple facial keypoints instead of just one. Adjustments I made here were to augment the data so as to prevent overfitting. I did this by doing random rotations of the face and random shifts of the face. I tried using Color Jitter to randomly change the brightness and saturation of the face, but I found that to be rather unsuccessful. I also modified my CNN from before, adding 2 more convolutional layers and experimenting a lot with input and output channel sizes. I trained with learning rate .001, batch size 1, and 25 epochs.

Net(
(conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
(conv2): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1))
(conv3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
(conv4): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
(conv5): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
(pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(pool3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(pool4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(pool5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc1): Linear(in_features=48, out_features=300, bias=True)
(fc2): Linear(in_features=300, out_features=116, bias=True)
)

Here is a plot of the MSE for training and validation sets, over 25 epochs.

MSE Loss
Success #1
Success #2

These are examples of my neural net correctly identifying the facial keypoints, with the ground truth point in red and the prediction in blue.

Failure #1
Failure #2

These are failure cases where the neural net does not correctly identify the facial keypoints. I believe this is due to the faces structure being a little different. The first example has a slim face and is smiling brightly; the second has a long chin with a beard and is also smiling.

Here are the learned filter visualizations for my neural net.

Layer 0
Layer 1

References

I used the PyTorch dataloader and CNN tutorials as linked in the spec as well as the filter visualization code as linked on Piazzas.