- Project Description:
Remember project 3? Where we needed the manually put all the key points. In this project, we will be automating this process.We just learned all tool we need for that - Convolutional Neural Networks
Data Loader: First before any augmentations I grayscaled each image, normalized to be from -0.5 to 0.5, and resized it to a smaller size 80x60. for augmentations, I used a random crop where I added padding and then randomly cropped, blurred some randomly chosen images, and random rotation too.
CNN: This is the most interesting part. I got my inspiration from Resnet architecture and wanted to use blocks where we add input to a block to the output. So for me, I have a block consisting of two relu/convolutions pairs with the following batch normalization. I use the convolution/average pooling layers after each block to increase the number of channels and decrease the image size. Then I flatten and add two fully connected layers. you can see the architecture below:
Loss Function and Optimizer - For the loss function, I used MSE error I chose batch size one and learning rate 0.001 for Adam optimizer.
Training Plot - here is the training pot as it reaches 0.00027 MSE test loss at 76th epoch
Results - here are some results from eh test dataset
Data Loader: I do all the same augmentations except two things that differ from the nose detection:
Instead of 60x80 now I resize to 180x240
I used an augmentation based on the project and affine warping. I shuffled the dataset and for each image, I was warping it to 10 middle triangulations between the current and the next one and added all the warpings to the dataset. this way I was able to synthetically to produce more data and get the facial keypoints locations. Here are some results from the augmentations:
CNN: I was building on the model I used for nose detection but this time I added some modifications to the block where instead of adding output and input of each block I concatenate it like in Unet model. Since the image is bigger now I had to use a bit more of those blocks followed by batch-relu-Conv layers
Results So after training around 256 epochs on batch size 4 with Adam optimizer on learning rate 0.0001 the model reached 0.00022 MSE loss. Here are some bad and good results:
Filters - let's visualise some convolution layers in hope of understanding what does they mean
Data Loader: Over here I added augmentation where I crop to the bounding box. Also expanded the bounding box by 30 from each side because sometimes key_points are outside of the box
CNN: I used the resnet18 model to train on the data but I had to change the input layer and the last output layer so it would account for one channel images and 136 keypoint coordinates
Results - On Kaggle my best score so far is around 11 but I have not set the random seed then so I could not reproduce that. The best score for me I got was 12 which I got after training. In general even tho i set up the manual random seed somehow it is different everytime.
Here are some results on the test set: