Project 4: Facial Keypoint Detection with Neural Networks

Roshni Rawal
CS 194-26 Fall 2020
November 08, 2020

Part 1: Nose Tip Detection

For part 1, I wrote a custom data loader which normalized the images and rescaled them to size 80*60. Below are some samples from my dataloader with nose keypoints shown on the images.

Sampled Images from Dataloader



I wrote my CNN with four convolutional layers and two fully connected layers. I used a batch size of 5 and a learning rate of 0.001. I trained my neural network. Below I show the train and validation accuracy over 10 epochs.

Train and Validation Accuracy

Below are images that my neural net really close to correctly. The ground truth point is in blue and the predicted point is in red. As you can see in all of the images the points are nearly overlapping!

Detecting Correctly

My net also detected some images incorrectly. I believe this is due to the angle that the faces are turned at. I think that because the model is trained on so few images and the images have very little variation it is easier for the model to predict forward facing faces, where the nose keypoint is at the center of the face.

Detecting Incorrectly

Part 2: Full Facial Keypoint Detection

For this part, I once again wrote a dataloader, however this time I added multiple transformations to make the training set more robust. The transformations I included are randomly adjusting brightness, randomly rotating images between -15 and 15 degrees, and of course rescaling and normalizing the images. Below you can see the sampled images with the full facial keypoints on the images (which have also been transformed).

Sampled Images from Dataloader



Below is the architecture of my neural network. I used 5 convolutional layers and 2 fully connected layers. I trained my net with a learning rate of 0.001 and a batch size of 8, over 10 epochs.

Architecture of Neural Network

Here is the training and validation loss over 10 epochs.

Train and Validation Accuracy

My model mostly detects forward facing faces correctly as you can see in the pictures below. It is able to capture the angle of the face, which is fantastic. It looks like the model struggles more with weird expressions (such as the first picture under detecting incorrectly), probably because most of the training images had a standard expression. It also looks like it struggles a bit with faces at an angle. This could be because once again our training set is quite small.

Detecting Correctly



Detecting Incorrectly

Below are the learned filters of my net. TODO

Learned Filters

Part 3: Training with a larger dataset

In this part, within my dataset class I cropped my images based on the given bounding boxes then later applied similar transformations as in the last part (rescale, tilt, brightness). Below are some images sampled from my dataloader with transformed keypoints on the faces.

Sampled Images from Dataloader



This time instead of writing our own CNN I used Resnet18 as suggested in the spec. I modified the first and last layer to deal with our input and output sizes and the resulting architecture is below.

Architecture of Neural Network

For training I used a batch size of 50, a learning rate of 0.001, and performed training over 10 epochs. My resulting train loss is shown below.

Train Accuracy

In the class Kaggle competition my MAE was reported as 41.58! Here are some images from the test set with my models predicted keypoints!

Predictions from my model



Images from my collection

For the images from my collection, I can tell that the model is missing the eye prediction by a lot in most of my images. on the second and third image my model predicts the nose and mouth very well, but in the first image I can tell that the crop might be throwing the model off. The model predicts the angle of the face pretty well and does better on tightly cropped images.
Poor prediction
Decent prediction
Decent prediction

Conclusion

I have never written in Pytorch before this project or constructed a neural network/done barely any machine learning. I am really proud that after a lot of work and some really weird bugs that I was able to complete this project. I learned a lot about hyperparameters, the general pipeline for training and testing, and about convolutional neural networks.

Sources