Facial Keypoint Detection with Neural Networks

CS 294-26 • Fall 2020 • Project 4

Shubha Jagannatha


Overview

In this assignment, I use neural nets to predict the facial keypoints in images.


Part 1: Nose Tip Detection

Here are some images with the nose keypoints:




Here's the training and validation accuracy during the training process. I used a batch size of 4, learning rate of 0.001, 25 epochs of training, an Adam optimizer, and MSE Loss:




Here are some success cases:



Here are some failure cases. I noticed most failure cases occurred when the person was tilted significantly away from the camera. This is probably because most of the images in our dataset have the individual facing straight towards the camera so the neural net was optimized for those kinds of images. Therefore, the neural net does not perform as well when the subject is turned away from the camera:






Part 2: Full Facial Keypoints Detection

Here are some images from my dataloader with the ground truth keypoints:







I tested a lot of different hyperparameters with my model. I used a batch size of 4, a learning rate of 0.001, 20 epochs of training, an Adam optimizer, and MSE loss. Here's the architecture I ultimately used for these results:







Here are the visualized learned filters from my first convolutional layer:







Here's the training and validation loss across iterations.:







Here are a couple of successful detections:






Here are a couple of failure cases. Similar to the nose keypoint failure cases, I noticed that the neural net struggles with images where the subject is not facing straight towards the camera because it seems the most of the images have the subjects facing straight forward. I was able to improve the accuracy through data augmentation where I added rotated versions of the images to my dataset.







Part 3: Train With Larger Dataset

Here are some of the images with the ground truth keypoints:







Here's the mean absolute error of my dataset: 18.39244




Here's the architecture of my model. I used ResNet18 I used a batch size of 4, a learning rate of 0.001, training for 20 epochs, an Adam optimizer, and MSE Loss:







Here's the training and validation loss across iterations:







Here are some keypoint predictions from the provided dataset:




Here are some keypoint predictions from my own collection (of Stranger Things characters and myself). It seems to do relatively well on most of them except for the last one here. I the main difference I notice is that the lighting is very different in the failure case compared to the success cases. Also, the subject is turned pretty significantly so this may have contributed to the poor output in the fourth image here: