CS194-26 Project 5

Nose detector

For this part of the project, I created a dataloader that would load images and return the image tensor along with a nose point detector. Below is a sample image of a loaded image with the datapoint

Afterwards, I trained a CNN on the first 192 images to find the tip of the nose. The CNN had 3 convolutional layers with 1 to 32 channels, 32 to 32 channels, and 32 to 32 channels, and kernel sizes of 7x7, 5x5, and 3x3 respectively. The CNN also had 2 fully connected layers with 896 to 250 channels, 250 to 50 channels, and 50 to 2 channels. Using these parameters, I got this graph for training and validation loss and these results

Here are two examples of the nose detector succeeding. It seems to work well when the faces are looking at the camera head on.

Here are two examples of the nose detector failing. I think this is due to the fact that these faces were turned to the side.

Full Facial Keypoints Detector

For this section, I first modified the data loader to augment the images when they were outputted. I data loader would randomly rotate the image by -15 to 15 degrees and shift it by -10 to 10 pixels up and down.

To train the neural net, I chose a learning rate of .001. I used 5 convolutional layers with 1 to 32 channels, 32 to 32 channels, 32 to 32 channels, 32 to 64 channels, and 64 to 128 channels. The layers had a 7x7, 7x7, 5x5, 5x5, and 3x3 kernel size respectively. Afterwards, I used 3 fully connected layers with 1024 to 512 channels, 512 to 256 channels, and 256 to 116 channels. Below is the loss graph while training the neural net.

Here are two examples where the neural net succeeded. Here, both faces are facing forwards and the neural net seems to work pretty well even when the pictures are augmented.

Here are two examples where the neural net failed. It seems to fail more often where faces are turned away from the camera.

Here are the filters that the neural net learned

Train with a Larger Dataset

I submitted to kaggle with a score of 15.29949. For the neural net, I chose a learning rate of .001 with 10 epochs. I changed the input of the neural net to 1 channel and the output to 136 channels. Below is a graph of the training and validation loss of the neural net.

Here are some examples of pictures from the testing set and some custom photos. It seems to succeed on faces looking directly at the camera pretty well and fail on faces that are turned away.