CS 194-26 Project 5

Jaiveer Singh

Part 1

The first part of this project involved training a small network to predict nose locations. Below is a set of sample images loaded from the dataset, with the nose annotated in red.

The following loss charts show the performance with various hyperparameter settings:

High Learning Rate Small Kernel Final Hyperparameters

Results of the nose model are shown below, in both successful and failure cases.

Success Success
Failure Failure

Generally speaking, it is photos where the subject's head is facing to the side where this net has poor performance. This is likely because the majority of the data features the nose head-on, and so the net learns to have a strong bias towards the middle of the face.

Part 2

The second part of this project involved training a network to predict face locations. Below is a set of sample images loaded from the dataset, with the face annotated in red.

The architecture of the model is as follows. (ReLU and Max Pool 2D were used as functionals between conv layers, hence they do not show up in the summary)

The hyperparameters used were 100 epochs, learning rate of 1e-3, and a batch size of 32.

The loss of the training and validation is shown below:

Results of the face model are shown below, in both successful and failure cases.

Success Success
Failure Failure

While this model also struggled with the subjects who weren't directly facing the camera, some of the more interesting failure cases were for unconventional head poses. The gaping mouth and presence of hands in the two failure cases shows how the model attempts to track the wrong edges in the picture.

The learned filters in the first layer are as shown:

Part 3

This part required modifying ResNet 18 to produce higher quality results on larger datasets. A sample of augmented images from the new dataset is below.

The architecture is as follows. Start with the classic ResNet architecture, add a convolutional layer at the input to map from 1 channel to 3, and then add fully connected layers at the output to return to the 68 feature points.

The hyperparameters used were 100 epochs, learning rate of 1e-3, and a batch size of 32.

The loss of the training and validation is shown below:

Some outputs, both good and bad, are shown below:

Custom outputs below:

My Kaggle name is Jaiveer Singh and my score is 17.37059.