In this project we used neural networks to automatically detect keypoints on faces, first detecting the nose then moving to the full face and then finally training with a large dataset.
In this part I used 32 people 6 images each to train, with 8 people 6 images each as validation. The goal in this part is to detect nose tip locations.
| | | | -- | -- | -- | -- |
Train Loss: 0.005753048229962587 Val Loss: 0.014989902265369892 LR: 1e-3, 3 convolution layers, 25 epochs
Train Loss: 0.005749324802309275 Val Loss: 0.01004868745803833 LR: 1e-3, 4 convolution layers, 25 epochs
Train Loss: 0.005196912679821253 Val Loss: 0.005820723250508308 LR: 1e-4, 4 convolution layers, 50 epochs
The best model was one with 4 convolutions, a learning rate of 1e-4, and 50 epochs:
Using this model, here are two good and two bad predictions:
Successes:
Failures:
These two failure cases are likely due to head rotation as well as the second image's shadows underneath the eyes that may look like that of the shadow underneath the nose.
We are now doing prediction of all 58 keypoints. I used a larger image input size of 240x180.
For the dataloader, I used a random brightness and contrast offset, in addition to a random rotation between -10 and 10 degrees.
I used 5 convolution layers and 2 fully connected layers, and a learning rate of 1e-3.
Train Loss: 0.003081735922023654 Val Loss: 0.004428303800523281 LR: 1e-3, 5 convolution layers, 25 epochs
Using this model, here are two good and two bad predictions, where orange is my prediction and blue is the ground truth.
Successes:
Failures:
Failure cases here are likely due to head rotation, where the model appears to fit towards the mean face instead of matching the photo's features.
Learned convolutional filters:
Layer 1:
Layer 2:
Layer 3:
Layer 4:
Layer 5:
Dataloader: Same as part 2 with brightness and contrast, and random rotation.
Kaggle Submission: Username JerryZ, Public Score 24.60084
Architecture: I used ResNet18, changing the input channel to 1 and the output to 68*2 = 136.
Train Loss: 0.0009061
Val Loss: 0.02571
LR: 1e-3, 10 epochs
Results:
Own photos:
The model does decently on all the photos, but performs worst on the photo of Bezos likely due to the thinness of the head not particularly matching other photos in the dataset.