In this part, I had to build a fairly simple convolutional neural network (CNN) to detect
nose tips for a database of faces.
First, I had to confirm that my data was being loaded correctly. Here are some samples from
the dataloader that I created.
Now, I was ready to train my simple CNN with the data! Here are the plots for the training and validation loss.
To show how hyper parameters affect the results, I decided to change the learning rate from the original 1e-3 to 1e-2 and 1e-4. Here are the results.
As you can see, when the learning rate is increased to 1e-2, the curve drops and flattens out
very quickly. In contrast, the higher learning rate of 1e-4 has a much slower drop until it
begins to flatten out to similar values as those of 1e-3 and 1e-4.
I also decided to remove the last layer of the CNN and see how that affects the results.
It seems that removing a layer got the loss to drop quicker than with the extra layer. This
was interesting because I expected a worse performance in some way (e.g. curve flattening
out to a higher loss.)
Now, let's see some actual results! Here are two successes and two failures.
Successes
Failures
It is pretty clear to see that when a person's nose is generally in the middle area of the image, the point is almost spot-on. However, when a person's face is oriented to the side, the point seems to move only slightly towards the nose from the middle. I believe this is due to many different factors such as not having enough epochs, using a suboptimal CNN, etc.
For this part, I have to deal with not just the nose tip, but all facial landmarks for the
same database of faces.
Here are some samples of my dataloader for this part.
This is the CNN architecture that I used:
Here are the visualizations of the learned filters of the first convolutional layer.
Here are two successes and two failures.
Successes
Failures
Judging from my successes and failures, it seems that my CNN is trying to find the best area in the image where the face will be, and minimize the loss for the orientation of the face (though I may be wrong).
The standard CNN that I decided to go with was ResNet-18. I used it with pretrained weights,
and modified the first and last layers to accomodate for grayscale images and the number of
facial landmarks. Specifically, I changed the in_channels to 1 in the first convolutional
layer, and changed the out_features to 136 for the last fully connected layer. I also used a
trick where I modified each filter of the first convolutional layer to be a 1 x K x K rather
than a 3 x K x K. Here are the results.
For reference, I decided to use a learning rate of 1e-4 for 25 epochs (batch size of 32) for
"Run 1", but as I saw the learning rate going lower and lower, I decided to lower the
learning rate (as I saw some stagnation despite the decrease) and try to go for another 10
epochs for "Run 2".
In hindsight, yellow was not necessarily the best color choice for my following test image results, but here they are anyways!
Also, here are some results on my own images!