CS194-26 Project 5

Facial Keypoint Detection with Neural Networks

By Alfredo Santana

Overview

In the project, I created dataloaders for different image databases, and created convolutional neural networks to train on these databases in order to be able to detect facial keypoints on peoples faces.

Part 1: Nose Tip Detection

In part 1 of this project we are trying to predict the tip of the nose of peoples' faces in different images. For the first part of part 1, I created a dataloader using pytorch and the IMM Face Database. My dataloader converts images to grayscale, normalizes the image, resizes the image to be 80x60, converts the image to a tensor, and finally returns the nose keypoint. Here are some samples from my dataloader along with their keypoints.

Dataloader Sample

Dataloader Sample

After I had my dataloader ready, I created a convolution neural network that has 4 convolutional layers and uses ReLU and max-pooling layers for downsampling after every pass of a convolutional layer. I also have 2 fully connected layers with a ReLU between them. I use mean-squared loss for my loss function, Adam as the optimizer, and used a learning rate of 1e-3. I ran for 50 epochs with a batch size of 4. Below is my training vs validation loss graph.

Loss Graph

I did some hyperparamter tuning by first decreasing the number of layers to three instead of four, and my accuracy decreased.

Loss Graph 3 Layers

Another hyperparameter I tuned was the learning rate. Instead of using a learning rate of 1e-3 I used a learning rate of 1e-4 and for me a learning rate of 1e-3 was still better.

Loss Graph lr = 1e-4

Shown below are some of the results of my CNN.

Success Case 1

Success Case 2

Failure Case 1

Failure Case 2

Part 2: Full Facial Keypoints Detection

In part 2 of this project, we don't just want to detect the tip of the nose, but instead we want to detect all the facial keypoints. I created another dataloader similar to part 1, but now I do some data augmentation on the images by using colorJitter to randomly change the brightness and saturation of the images. I also now resize the images to by 160x120, and grab all 58 facial keypoints instead of just the nose keypoint. Below are some samples of my dataloader for part 2 of this project.

Dataloader Sample

For my CNN in part 2 of this project, I had a similar architecture to my CNN from part 1, but I now use 5 convolutional layers. I use a kernel size of 3, a learning rate of 1e-3, and a batchsize of 32. Below is the architecture of my CNN model.

Architecture of Model

These were the parameters that worked best for me. I tried adding more convolutional layers, as well as larger batchsizes, but didn't find more accuracy. Below is my train and validation loss graph.

Loss Graph Part 2

After training my model, these are some of the results I got.

Success Case 1

Success Case 2

Failure Case 1

Failure Case 2

I also visualized the filters that the first convolutional layer learned:

Learned Filters

Part 3: Train with Larger Dataset

For the last part of this project, we are using a much larger dataset, specifically the ibug face in the wild dataset which contains 6666 images. The making of the dataloader in this part was different than the previous two parts because we were giving a bounding box to find and crop the face in the given image. Once I cropped the face and updated the facial keypoints accordingly, I did the same data augmentation that I did in the previous part. I used an 80/20 ratio to split the dataset into training and validation sets. Below are some samples of my dataloader for part 3.

Dataloader Sample

Dataloader Sample

For my CNN I ended up going with the ResNet18 from Pytorch, but modified the first layer to take 1 channel (grayscale images) instead of the usual 3 (RGB images) it takes. I also modified the output channel of the last FC layer and changed it from 1000 to 136. Below is the architecture of my model.

Architecture of ResNet Model