Project 04

Facial Keypoint Detection

Devesh Agarwal


1: Nose Tip Detection

Using images in the IMM Database, I was able to write some code to detect tips of noses in facial images. I used a Convolutional Neural Network in order to model for and detect the nosetips. I converted the images to grayscale, rescaled, and resized the image.  

Ground Truth Sample #1
Ground Truth Sample #2
Ground Truth Sample #3
Ground Truth Sample #4

Neural Network

Here is my model and my model parameters:  

My Model

Training and Validation Accuracy

I have plotted, below, training and validation accuracy across different n values (epochs).  

Loss vs. n

Results

We can observe the difference between the ground truth (represented by the blue dots) and the predicted values (represented by the red dot).  

Nose Prediction Sample #1
Nose Prediction Sample #2
Nose Prediction Sample #3
Nose Prediction Sample #4

There are some limitations that can cause failure of this model including the limited training set for the model, awkward facial orientation, and the picture quality or saturation of the image itself.  

Failure cases:  

Failure Case #1
Failure Case #2

2: Full Face Detection

After detecting the central nose keypoint, I worked on generating full facial feature recognition by generating 58 keypoints on each image.  

Data Augmentation

In order to prevent overfitting, I worked on data augmentation. Here I used some key functionalities of transforms and wrote functions to randomly augment the data by flipping and/or rotating it.  

Sample #1
Sample #2
Sample #3
Sample #4

Neural Network

Here is my model and my model parameters:  

My Model

Training and Validation Accuracy

I have plotted, below, training and validation accuracy across different n values (epochs).  

Loss vs. n

Results

We can observe the difference between the ground truth (represented by the blue dots) and the predicted values (represented by the red dot).  

Prediction Sample #1
Prediction Sample #2
Prediction Sample #3

Failure cases:  

Failure Sample #1
Failure Sample #2
Failure Sample #3

Filters

Displayed below are the filters at each layer.  

Layer 1
Layer 2
Layer 3
Final Layer

3: Training with Larger Dataset on Colab GPUs

I repeated the process described in Parts 1 and 2 on a much larger dataset. In order to do this I set up a Google Colab with GPU runtime and ran a ResNet 18 layer model. Using an Adam optimizer and a learning rate of 0.001.  

Neural Network

Here is my model and my model parameters:  

ResNet18 Model

Training and Validation Accuracy

I have plotted, below, training and validation accuracy across different n values (epochs).  

Loss vs. n

Results

We can observe the faces that were detected on the dataset below:  

Prediction Sample #1
Prediction Sample #2
Prediction Sample #3
Prediction Sample #3

Custom Images

And finally, some of my favorite images from this course with keypoints detected.  

Successful Custom Image #1
Successful Custom Image #2
Derek, Failure Case due to Angle