Project 4: Facial Keypoint Detection with Neural Networks

Tushar Sharma

Part 1: Nose Tip Detection

First, I had to experiment with PyTorch and figure out how to make a dataloader. Some fidding and lots of abstraction later, I ended up with something I enjoyed. Something I really want to emphasise is that I did some preprocessing so as an input I took in a list of images and a list of arrays for the keypoints. My images are 60x80.

img 1 img 2 img 3 img 4

I'm pretty happy with how this looks, but there's also not really much to this because it's just one point. Onward to the neural network!

My neural network was pretty basic, I used three convolutional layers, doubling my channels every time, because I read from a few papers that this is a good practice to have. I used a constant kernel size of 5x5 with no complex stride or filtering! After each conv layer, I apply a ReLU operation and then send that to a 2x2 maxpool layer to shorten the input. Here's my loss graph over training.

simple net loss over time

This is looking pretty good, so its time to look at some results.

res 1 res 2 res 3 res 4

As you can see, the results are at best, mixed. There are some good ones in there, especially the one with the two dots right next to each other. Side views are much more of an issue, on the other hand. This model is very simple, and while it trains really fast, obviously results aren't beautiful.

Part 2: Full Facial Keypoints Detection

Thanks to the power of abstraction, changing up the dataloader was nothing more than having a different transform. ColorJitter refused to cooperate with my laptop, so I decided to do random horizontal shifts and rotations up to +- 10 pixels and 10 degrees respectively. This is done on the fly so we have different experiences during training! Here are some outputs.

img 5 img 6 img 7 img 8

My neural net isn't incredibly complicated here. I have 6 conv layers, and basically instead of doubling channels every time (using 120x160 images here, that would be Very Bad), I used another method I found where I would keep the number of channels the same for one layer and then double it the next. I always used a 5x5 filter, but in the same mindset I used a 2x2 maxpool every other layer. I used the usual learning rate 1e-3 and Adam for optimization.

medium net loss over time

That graph looks good, but numbers can be misleading. What about actual images?

res 5 res 6 res 7 res 8

I'm so happy with these results. As you can see, side results look SO much better with this model. They aren't amazing though. Of particular note is the keypoints around the bottom of the face; they are kind of curved and don't fit well. Strong rotation angles also really challenge this neural net.

learned filters

Finally, here are what the conv filters for the first three conv layers look like every 10 epochs of training.

Part 3: Train With Larger Dataset

Okay, so one thing I did to speed up this process is to make everything immediately a grayscale image and cropped. I found this to be significantly faster while training!

For my model, I used resnet 18 with 10 epochs, the only change is that the first conv layer was made to have 1 channnel in as input and the final fc layer has size 136 as output for 68 * 2. I used adam with standard learning rate of 1e-3 and MSEloss. I wanted to change the loss, but we are comparing points, it actually makes a lot of sense here to use MSEloss.

large net loss over time

Honestly, training was kind of fast compared to what others posted on piazza. The bad news.. was some overfitting. Those dots looked perfect, so I tried to train for less epochs (10 instead of 20 or 15 from before, and while it helped a bit it didnt too much.) More importantly, heres what test results look like.

img 9 img 10 img 11

So the results on the test set are alright. They're not great, that's for sure. Of particular interest is the lack of shape around the face. I actually ended up getting a lot of heart shaped faces as outputs somehow!

I was able to get a MSE score of 18.9 on Kaggle, a result I'm overall fine with given the circumstances. Honestly, from looking at the test results, I was so scared it would be worse. It was a real painstaking but worthy experience learning pytorch for this project!