CS194-26 PROJECT 4, AARON CHEN

Overview:

In this project, we implemented a feature that would have been very useful for the previous project: Automatic face label detection! By starting out with CNN that detected just nose keypoints to one that detects all 58 labelled keypoints of the original dataset.

Part 1, nose tip detection

We built a data loader from Pytorch using the Dane premarked images. Images were turned into gray, normalized in value, and downscaled to 80 by 60, as well as split into two sets for training and validation sets.

Sampled from the dataloader, with the red dot being the labelled location of the nose:

I trained a neural network on the training set of the data; here are the losses being graphed. The x axis is every epoch, and the y axis is the average loss of a batch over the epoch. The loss being used is MSE from the nn module.

training loss per epoch:

validation loss per epoch:

Here are some successful results, where the green dot indicates the labelled point and the red one our estimate from the neural net.

Here are some failures; clearly, because the majority of the training data are facing forwards, the net fails harshly when the subject’s face is turned.

Part 2: Full Facial Keypoints Detection

I did the same thing as the last section, but with a more advanced neural net. I will expand on it later, but the main difference important to the reader is that it now outputs 58 x,y predictions for input in a batch, instead of the previous singular nose keypoint. Here are some samples from the dataloader:

The way the neutral network is designed;

        I have 4 convolutional layers, each of which is wrapped by a relu and max pool. The relu exists in order to introduce nonlinearity, and the max pool subsamples the last output by 2. This s finished by two fully connected layers, where the first one is followed by a relu. in terms of input channels to output channels, the convolutional layers were: 4 to 6, 6 to 16, 16 to 24, 24 to 24. The first two conv layers used 5x5 filters and the last two used 3x3. The hyperparameters I employed were a learning rate of 0.001, a batch size of 4, and 30 epochs.

Here is a visualization of the six 5x5 filters output by the first convolutional layer.

The training and validation losses, respectively, where the x axis is epochs and the y axis is mse loss.

training loss by epoch

validation loss by epoch

Here are two cases where the net successfully predicted label points:

Here are cases where the net failed to produce good label points. again, the turned face... my model is almost certainly overfitting (as you can tell from the validation accuracy)