Project 5

Part 1: Nose Tip Detection

Dataloader

Below are sampled images from your dataloader visualized with ground-truth keypoints:

Hyperparameter Tuning

First I swept over different batch sizes and got the following plot:

I noticed that smaller batch sizes yielded better results:

Thus I used a batch size of 2.

Next I swept over learning rates while keeping batch size constant at 2:

This shows that learning rate 1e-3 is the best.

Training and Validation MSE

Below is the train and validation MSE loss during the training process where batch size is 2 and learning rate is 1e-3:

Results

Below are some examples of the ground truth (green) and predictions (red). The top 2 are examples where it worked well and the bottom 2 are examples where the model did poorly. It failed in these two test cases because the left image seems to be too dark while the right image face has a strong contrast on the face where the prediction is made.

Part 2: Full Facial Keypoints Detection

Dataloader

I performed shifting, rotating, and color jitter data augmentation:

CNN Architecture

Below is a summary of the CNN architecture.

Training

Below is the training and validation MSE:

It was trained with a learning rate of 1e-3 and batch size of 4 for a total of 60 epochs.

It seems to do well of faces that are looking straight ahead but not as well for faces that are showing more side profile.

Learned Filters

Part 3: Train With Larger Dataset

Dataloader:

I used the same augmentations from part 2. The following are a couple examples of the training images:

CNN

I used a modified version of ResNet 18. The input channel is 1 for grayscale and the output is the last conv layer with 136 channels.

Results

I submitted to the class Kaggle competition with user name Tejasvi Kothapalli.

Below is a plot showing both training(blue) and validation(orange) loss across iterations:

I trained with batch_size = 128 and learning_rate = 1e-3. I also trained for 100 epochs and dropped the learning rate by a factor of 10 every 30 epochs.

The following are the results from running the model on the test set.

The following are results of faces from my own collection:

It seems to do well on faces that follow the average face shape (oval shapes in 1st and 3rd image). It also does well on beards (3rd image)! It does not seem to do well on face shapes that are out of the average distribution. In the second image the model cannot get Kumail's super sharp jawline! It also doesn't seem to do well on slight side profiles (4th image).