194 Project 4: Facial Keypoint Detection with Neural Networks

Kush Khanolkar

Part 1: Nose Tip Detection

In this section we used a basic 3 layer CNN to predict nose keypoints.
Here are some images from the Dataloader:

Italian Trulli Italian Trulli
Here are the training and validation plots from training:

Italian Trulli
Here are some good images from the output of the model (blue) compared to the ground truth (Red):

Italian Trulli Italian Trulli
Here are some bad images from the output of the model (blue) compared to the ground truth (Red):

Italian Trulli Italian Trulli Italian Trulli
Faces that are not looking straight at the camera (instead off to the side) caused poor performance in both this part and the next one. As you can see above any side profile faces have the dot way off the mark, but the head-on ones look pretty good. This is likeley because the head-on shots had the most data and had the widest nose/face targets.

Part 2: Full Facial Keypoints Detectio

In this section we used a larger 5 layer CNN to predict nose keypoints.

Here is a detailed architecutre of my model. Each Conv 2d was paired with a F.max_pool2d(F.relu(conv2D) except for the 3rd layer which had no pool2d. The pool2d were of size 2x2. The first Linear layer has a Relu Activation. This model was trained with an Adam Optimizer with lr=1e-3 and used MSE Loss. The model was trained over 12 epochs:


Net2(
(conv1): Conv2d(1, 6, kernel_size=(7, 7), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(conv3): Conv2d(16, 38, kernel_size=(5, 5), stride=(1, 1))
(conv4): Conv2d(38, 56, kernel_size=(3, 3), stride=(1, 1))
(conv5): Conv2d(56, 72, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear(in_features=1728, out_features=256, bias=True)
(fc2): Linear(in_features=256, out_features=116, bias=True)
)

Here are some images from the Dataloader. I implemented Color Jitter and Rotation Augmentation to improve results for this part and the following part:


Italian Trulli Italian Trulli Italian Trulli
Here are the training and validation plots from training:

Italian Trulli
Here are some good images from the output of the model (blue) compared to the ground truth (Red):

Italian Trulli Italian Trulli
Here are some bad images from the output of the model (blue) compared to the ground truth (Red):

Italian Trulli Italian Trulli
Faces that are not looking head on to the camera is once again the thing that kills my model in this part. As you can see above any side profile faces have the dot way off the mark, but the head-on ones look pretty good. This is likeley because the model also learns the face "shape" which doesn't help it much for side profile faces without an idea of perspective.
Here are some of the filters from my first layer:

Italian Trulli Italian Trulli Italian Trulli
Here are some of the filters from my second layer:
Italian Trulli Italian Trulli Italian Trulli
Here is a filter from my 3rd, 4th, and 5th layers:
Italian Trulli Italian Trulli Italian Trulli

Part 3: Train With Larger Dataset

I only made 2 modifications to the default resnet 18:

I changed the first layer to: Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
and last layer: Linear(in_features=512, out_features=136, bias=True)
This was simply to fix the input to single channel and output to 136 features. Finally I set the model to pretrained to hopefully get some boost in my training process using some residual information from a pretrained net.


I trained using a Adam optimizer with lr = 1e-3 and MSELoss() with around 10 epochs of traineing


My results with this on Kaggle (username Kush Khanolkar) was a MAE of 10.46890.

My training loss graph (unfortunately due to technical issues with Collab I lost my losses array after training for many hours so had to salvage one intermediate plot I made). Here is the best visualization of the training losses (with MSE on the y-axis and iteration on the y (1600 iterations = 1 epoch)):

Italian Trulli

Here are some images from the test set I did well on:

Italian Trulli Italian Trulli Italian Trulli
Here are some images from the test set I did bad on:

Italian Trulli Italian Trulli Italian Trulli
Any image with a face that is unobstructed and adult my net did very well on. Images that have any type of obstruction including hands, the photo edge, dark lighting or clothing around the face my net has issues with. In addition, the proportions of childrens faces for some reason threw my new off as you can see the outlines are a bit larger than they should be.



References

Data Loader Tutorial
Pytorch Network Tutorial