I created the dataloader for the images and their nose points plotted on them, and then displayed a few.
I then trained my model and plotted the train and validation MSE loss during the training process.
I tried some hyperparameter tuning and changed the model's channel size from 14 to 12, as well as changed the learning rate from 0.001 to 0.0001. The hypertuned model gave me almost twice the average training loss that I had with the original model, but around the same average validation loss. Here is the train and validation losses graph:
Here are the 2 images that work well and then the 2 that detect incorrectly (red points are ground truth and blue points are predictions). For the 2 that detect incorrectly, I believe it's because the first incorrect one is super off-center, even though the woman is facing the camera head-on. The second one is because his face is extremely turned to the side, and the nose tip is nowhere near the center, which makes it much harder to detect.
I made the dataloader for the images and all of their facial keypoints plotted on them, and then displayed a few (ground truths). I also followed the tutorial given for data augmentation and ended up using transforms.ColorJitter, modifying the brightness, contrast, and saturation.
Here is my model's detailed architecture:
Here are 2 predictions that work well, followed by 2 that don't work as well. The red points are the ground-truths and the blue points are the predictions from the model. The 1st one that doesn't work well has a face that's tilted to the left that's also proportionally larger in the frame than the other photos. The 2nd one that doesn't work as well is bad because the head is shifted down the frame so much compared to the rest of the dataset.
Here is my visualization of the learned filters:
The detailed architecture of the model is: resnet18() parameters with a first layer of 1, 64 and a last layer of 512, 136 (in terms of in-features and out-features). This is my graph of losses:
Here is the model run on some of my own images: