CS 194-26 Fall 2020

Project 4

Christian Murray

Overview

In this project the goal is to familiarize ourselves with neural networks and their power for image processing by training a NN to detect faces, starting with just a single nose point.

Approach

1: Nose Tip Detection

The following are images from the dataloader that will show a ground-truth point for the nose.

The next image is a graph of the MSE loss for training and validation for our nose-tip detection NN.

Lastly, here are two images detected correctly, and two images detected incorrectly by our NN with the red dot being our prediction and the green being ground-truth.

Correctly Detected.
Correctly Detected.
Incorrectly Detected.
Incorrectly Detected.

It seems like the brightest point on the image is being selected as nose-tip in the incorrect images. Perhaps images with a wider range of brightness would have trained our NN better.

2: Full facial keypoint detection.

The following are images from the dataloader that will show the ground-truth points for the images.

The architecture of our NN is shown in the below image.

NN architecture

The next image is a graph of the MSE loss for training and validation for our nose-tip detection NN.

Lastly, here are two images detected correctly, and two images detected incorrectly by our NN with the blue dots being our predictions and the red being ground-truths.

Correctly Detected.
Correctly Detected.
Incorrectly Detected.
Incorrectly Detected.

Uneven brightness and transformations to tilt the image to higher angles seemed to be the main cause of our predictions being wrong in this segment.

Below are images visualizing the learned filters.

Learned filter for first convolution.
Learned filter for second convolution.
Learned filter for third convolution.
Learned filter for fourth convolution.
Learned filter for fifth convolution.

3: Training with larger datasets.

For this segment I used a resnet18 pretrained model the following parameters for 5 epochs with a learning rate of .001

The following image is a graph of the MSE loss for training for our NN.

Predictions from the NN are shown below.

Below are images I tried to predict facial keypoints for. Chihiro has particularly poor predictions, likely since its a cartoon face with different proportions than a human face would. Emma Stone similarly has poor detection, likely due to a difference of how her face is framed in the image compared to the training set.