CS 194: Image Manipulation and Computational Photography, Fall 2020

Project 4: Facial Keypoint Detection with Neural Networks

Rami Mostafa, cs194-26-abo



Overview

For this project, I got to implement convolutional neural networks capable of determining the outlines and key points of a human's face.

Part 1: Nose Tip Detection

To start the project, I first had to construct a CNN that was capable of predicting a single point on a human's face: the point of the nose. To do so, I constructed a dataloader that applied rescaling and normalizing transforms to each image, and ran it through a simple 3-4 layer CNN.

Here are some sample images from the Dataloader

Dataloader Example 1

Dataloader Example 2

Dataloader Example 3

Dataloader Example 4

For my CNN, I used a 3 layer network composed of 3 conv layers and 2 fully connected layer. Each conv layer was followed by a relu and a (2, 2) max pool layer. The first fully connected layer was also followed by a single relu as well. I trained using a batch size of 4 and a learning rate of 1e-3. The loss was calculated using MSELoss and optimized with an Adam optimizer. Below are the results of the training and validation losses.

Training Loss Per Batch

Validation Loss Per Batch

For a more consolidated view, here is the training (blue) and validation (orange) loss per epoch (where validation loss was calculated before training loss)

Training and Validation Loss Per Epoch

After training the model, here are a couple of successes and failures.

Success Example 1

Success Example 2

Failure Example 1

Failure Example 2

It seems that the model has trouble predicting the nose point on image that are not a straight-on view of the face. This makes sense since it is more often that the nose is around the center of the image than not

Part 2: Full Facial Keypoints Detection

In this part, I took the model a step further to predict not just a single key point, but every key point labeled on the face. Instantiated the dataloader the same way as in part 1 with slightly different transformation to each image. Additionally, I implemented some data augmentation by translating and adding color jitter to each image in the dataset, and adding it back to the original dataset. Additionally, I added a few more convolutional layers to turn this model into a 5 layer CNN (with the same loss function, optimizer, and batch size as in part 1).

Here are some sample images from the Dataloader

Dataloader Example 1

Dataloader Example 2

Dataloader Example 3

Dataloader Example 4

Below is a more detailed view of the model architecture used to predict key points in this part, and the associated training and validation loss over 10 epochs.

Model Architecure

Training Loss Per Batch

Validation Loss Per Batch

Training and Validation Loss Per Epoch

After training the model, here are a couple of successes and failures.

Success Example 1

Success Example 2

Failure Example 1

Failure Example 2

Once again, it seems that the model has trouble predicting the facial key points on image that are not a straight-on view of the face. This may be due to the fact that certain facial features may be hidden and must be approximated by the model, which can be difficult without excessive tuning of the model.

Finally, below I have displayed the learned first and last filters of each convolutional layer of the network.











Part 3: Train With Larger Dataset

For this part, I took to Google Colab to train on a much larger dataset of faces and key points. I then submitted my models results on the test set to kaggle and achieved a score of 15.30842.

Below, once again, is the detailed architecure of the ResNet18 model used with slight modification to the first and last layer to suite the model to this specific problem.

Model Architecure

Additionally, here are the associated training and validation losses per epoch:

Training Loss Per Epoch

Validation Loss Per Epcoh

With the help of Google Colab's GPU, the model was able to achieve some decent values of loss, and produce the following predictions on the test set.

ResNet18 Example 1

ResNet18 Example 2

ResNet18 Example 3

ResNet18 Example 4