Here are 3 samples of the images loaded by the DataLoader marked with their nosetip (in pink).
Train and Validation Loss
I decided to tune the hyperparameters learning rate and numbers of convolutional layers (3 vs 4). There is definitely an element of randomness to the losses every time, but in general, I thought that because a 3 layer net would produce a less complex model than a 4 layer net, it made sense that the losses for 3 layers seem to indicate a bit of underfitting and plateaued very fast. Between the 3 learning rates of 1e-2, 1e-3, and 1e-4, the fastest learning rate definitely converged first. The median rate seemed to have a more expected growth rate as both losses decreased gradully over time. However, the axis indicated this graph actually had a really small loss too. Finally, the slowest learning rate converged last, taking around 10 epochs.
Here are some of the final predicted keypoints. Red is the ground truth, and blue is my prediction. It definitely seemed like the predictions were struggling with faces that were not facing forward, and my guess for why this is is that most of the faces in the training dataset were probably facing forward.
Here are 3 samples of the images loaded by the DataLoader marked with their facial keypoints (blue).
For this part, I created a Neural Net with 5 convolutional layers, each followed by a ReLU, and all but the third were followed by a maxpool of size 2. I then flattened the net and followed up with 2 more fully connected layers (the first of which was followed by ReLU). I used a batch size of 8, trained for 20 epochs, and ended up with this graph of the training and validation losses.
Train and Validation Loss
Here are some of the final predicted keypoints. Red is the ground truth, and blue is my prediction. As with the first part, it seemed like the model struggled to predict faces that were not facing forward. I would still guess that most of the training data probably had a larger proportion of forward-facing faces than not.
Learned convolutional filters
Layer 1 Conv Weight Filters after Training
I used the pretrained pytorch model ResNet18 to make my predictions. I changed the input channel to 1, and I changed the output channel to 68*2=136. Since there were a lot of images, I used a larger batch size of 128 and learning rate of 0.005 over 15 epochs.
Train and Validation Loss
Here are some of the final results on the validation set. The ground truth is plotted in blue, while my predictions were plotted in orange.
Here are some of the predictions on the test set. I think overall, the points look how I would expect them to.
I think the model did not predict any of the faces that well to be honest, but it definitely did better on the second image than it did on the first and the third. My guess is that this may be because the second face has both stronger features and also more color contrast, so the facial features are easier to detect.