Faceial Keypoints with Neural Networks

- Project Description:

Remember project 3? Where we needed the manually put all the key points. In this project, we will be automating this process.We just learned all tool we need for that - Convolutional Neural Networks

Our goal

Nose Detection

Ok so let's try to approach the problem step by step - Are we able to even find one facial key point. How about the nose? Weare going to use the same dataset I used for the project, Here you can see some nose examples from the dataset:

Data Loader: First before any augmentations I grayscaled each image, normalized to be from -0.5 to 0.5, and resized it to a smaller size 80x60. for augmentations, I used a random crop where I added padding and then randomly cropped, blurred some randomly chosen images, and random rotation too.
CNN: This is the most interesting part. I got my inspiration from Resnet architecture and wanted to use blocks where we add input to a block to the output. So for me, I have a block consisting of two relu/convolutions pairs with the following batch normalization. I use the convolution/average pooling layers after each block to increase the number of channels and decrease the image size. Then I flatten and add two fully connected layers. you can see the architecture below:
Loss Function and Optimizer - For the loss function, I used MSE error I chose batch size one and learning rate 0.001 for Adam optimizer.
Training Plot - here is the training pot as it reaches 0.00027 MSE test loss at 76th epoch
Results - here are some results from eh test dataset

Some good results

Some bad results

Full Keypoints Detection

Now we are ready to upgrade our model for all keypoints We will use higher resolution images and more augmentations because we do have not many images for the training set

Data Loader: I do all the same augmentations except two things that differ from the nose detection:
- Instead of 60x80 now I resize to 180x240
- I used an augmentation based on the project and affine warping. I shuffled the dataset and for each image, I was warping it to 10 middle triangulations between the current and the next one and added all the warpings to the dataset. this way I was able to synthetically to produce more data and get the facial keypoints locations. Here are some results from the augmentations:
CNN: I was building on the model I used for nose detection but this time I added some modifications to the block where instead of adding output and input of each block I concatenate it like in Unet model. Since the image is bigger now I had to use a bit more of those blocks followed by batch-relu-Conv layers

Results So after training around 256 epochs on batch size 4 with Adam optimizer on learning rate 0.0001 the model reached 0.00022 MSE loss. Here are some bad and good results:

Some good results

Some bad results
Filters - let's visualise some convolution layers in hope of understanding what does they mean

Training with larger dataset

So far we were working with a very small dataset, let's work with a bigger one now.

Data Loader: Over here I added augmentation where I crop to the bounding box. Also expanded the bounding box by 30 from each side because sometimes key_points are outside of the box
CNN: I used the resnet18 model to train on the data but I had to change the input layer and the last output layer so it would account for one channel images and 136 keypoint coordinates
Results - On Kaggle my best score so far is around 11 but I have not set the random seed then so I could not reproduce that. The best score for me I got was 12 which I got after training. In general even tho i set up the manual random seed somehow it is different everytime.
- 7 epochs with batch size 4 and AdamW optimizer with learning rate 0.01
Here are some results on the test set:
Here are some results on images I used in project3: