Ground Truth Images + Labels From Data Loader
Above are three images from the data loader I created with the labels plotted on top of them. They look somewhat blurry since all images were resized to 80 by 60
Training and Validation Loss
My neural network consisted of 3 convolutional layers and I tuned the learning rate and the batch size. The optimal values I found were a learning rate of 0.001 and batch size of 24.
At the end of training my model had a training loss of 0.0013 and a validation loss of 0.0018
Tuning Learning Rate
When the learning rate was 0.01 the final training loss was 0.0037 and the validation loss was 0.0051. When the learning rate was 0.001 the training loss is 0.0013 and the validation loss is 0.0025. When the learning rate was 0.0005 the final training loss was 0.0040 and the validation loss was 0.0032.
Tuning The Batch Size
When the batch size is 12 the final training loss was 0.0022 and the validation loss was 0.0038. When the batch size was 24 the final training loss was 0.0013 and the validation loss was 0.0018. When the batch size is 48 the training loss is is 0.0013 and the validation loss is 0.0025.
Results
Here are some results that look pretty good
Here are few bad ones
The person in the first turned his face a lot and I think that cuased the model to predict wrong. In the second I think the creases caused by the smile confused the model
Ground Truth Images + Labels From Data Loader
Above are three images from the data loader I created with the labels plotted on top of them. These is data augmentation so the images can be tilted or shifted
Model Architecture
My model has 5 convolutional layers each followed by a maxPool layer. The first layer takes 1 channel and outputs 32. The second takes in 32 and outputs 64. The third layer takes in 64 and outputs 128. To the reduce the time the model takes to run the last two convolutional layers take in 128 and output 128. The model also has 2 fully connect layers. The first one is of size (1920, 1920) and is followed by a Relu. The final fully connected layer has size (1920, 116) to get the desired output of all the label coordinates. I trained my model at a learning rate of 0.001, a batch size of 48, and for 20 epochs.
Training And Validation Loss
The training loss was 0.0035 and the validation loss was 0.0046 in the final iteration of training.
Results
Here are some results that look pretty good
Here are few bad ones
There seems to be very little variation even if data augmentation was added into the dataset, it seems to very close to average face.
Visualizing the Filters
Ground Truth Images + Labels From Data Loader
Model Architecture
The model I used is resnet18, but I had to change the first layer to take 1 channel since my images are grey scale and I also changed the last fully connected layer to output 136 points.
Training And Validation Loss
Results
Here are a few results on my validation set
Results On Other Images I Found
It seems to have a harder time detecting eyes in these images
Kaggle Competition
My username on Kaggle is EitherCod and the MAE I ended with is 186.28956. There is definitely a bug with my code. I tried training on a single batch consisting of two images. When I ran net(two_image_batch)[0] it gave different results than net(two_image_batch[0]). The images in the batches seem to be interacting with each other for some reason.
Obama Image From: https://www.thedailybeast.com/obamas-nightmare-reelected-in-2012-but-republicans-take-the-senate