CS 194-26 Project 5 Facial Keypoint Detection with Neural Networks

Lucy Liu

Overview

In this project, learn how to use Convolutional Neural Networks to detect facial keypoints without having to use clicking anymore.

Part 1: Nose Tip Detection

First, we created a dataloader that loads the images in grayscale, with pizel values from 0 to 255, normalzes the float values in range -0.5 to 0.5, and resized the images to 80x60. Below are some images sampled from my dataloader with the ground-truth keypoints, and just the nose ground-truth keypoints to pass into the CNN.

I then created a CNN to train the on batched of these images. The architecture of my network is in the below image. To train, I used a batch size of 4, learning rate of 0.001, and ran the model with MSE Loss and Adam Optimizer for 25 epochs.

Nose CNN Architecture
Train and Validation Loss for NoseCNN batch=4 lr=0.001
Train and Validation Loss for NoseCNN batch=5 lr=0.001

Here are some images where the network detects the nose properly:

Model prediction good result 1 on validation set
Model prediction good result 2 on validation set

Here are some images where the network detects the nose incorrectly:

Model prediction bad result 1 on validation set
Model prediction bad result 2 on validation set

Explaination: It is possible that the hair on the man in the second set of images confuses the model to not recognize it as hair but rather recognize the hair as part of the face, resulting in the predicted nose point to be higher than the ground-truth. This reflects the skew in the dataset when it comes to skin color.

Part 2: Full Facial Keypoints Detection

Below are the changes from Part 1: