CS194-26: Project 4

Susan Lin

Objective

In this project, we learned to use Pytorch and Colab to implement a Convolutional Neural Network that would be able to identify faces.

Part 1: Nose Tip

In this part, I used three convolutional layers and two fully connected layers for the architecture.

(conv1): Conv2d(1, 12, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(12, 28, kernel_size=(5, 5), stride=(1, 1))
(conv3): Conv2d(28, 32, kernel_size=(5, 5), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc1): Linear(in_features=768, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=2, bias=True)

Ground-Truth Keypoint Images

Loss

Validation Set is in Orange, Test Set is in Blue

Correctly Detected Images

The output points are in green, the ground-truth points are in red

Incorrectly Detected Images

The output points are in green, the ground-truth points are in red

Because of how small our dataset is, our model is more likely to fail in situations where the nose is more obscure or could be confused by another similar shape. When the face is more straight on, and the nose is in a more visible position, the model is more accurate.

Part 2: Full Facial Keypoints Detection

In this part, I used five convolutional layers and two fully connected layers for the architecture. I added a second group of data by augmenting my existing dataset; I randomly rotated each image by -10 to 10 degrees and translating by an offset of 0, 25, 50, 75, or 100. I trained the model for 20 epochs, using an Adam optimizer, and using a learning rate of 0.001.

(conv1): Conv2d(1, 12, kernel_size=(7, 7), stride=(1, 1))
(conv2): Conv2d(12, 18, kernel_size=(5, 5), stride=(1, 1))
(conv3): Conv2d(18, 24, kernel_size=(5, 5), stride=(1, 1))
(conv4): Conv2d(24, 28, kernel_size=(3, 3), stride=(1, 1))
(conv5): Conv2d(28, 32, kernel_size=(3, 3), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc1): Linear(in_features=256, out_features=240, bias=True)
(fc2): Linear(in_features=240, out_features=116, bias=True)

Ground-Truth Keypoint Images

Loss

Validation Set is in Orange, Test Set is in Blue

Correctly Detected Images

Incorrectly Detected Images

Faces that stray further from an average face, are tilted a lot, or are less contrasted are more likely to be detected incorrectly. The more "visible" (head on, distinguishable orientations) a face is, the more likely the model is accurate.

Images of the Filters

First Convolution Layer Filters

Second Convolution Layer Filters

Part 3: Train With Larger Dataset

Conclusion

This was my first time working with Pytorch and Colab -- it was quite interesting!