PROJECT 4:

Facial Keypoint Detection

Part 1. Nose Tip Detection

For this part, I used images and keypoints from the IMM Face Database, resizing inputs to

the recommended 80x60 after converting them to grayscale images in the -0.5 to 0.5 range.

These images were loaded into two separate custom PyTorch Datasets for male and female

faces, respectively. A random sampling of images from each dataset can be seen below, with

their associated training labels (nose tip keypoints, in this case) shown in light blue.

Randomly selected female faces & associated nose tip landmarks:

Randomly selected male faces & associated nose tip landmarks:

The male faces dataset was loaded into a PyTorch Dataloader for training, using a batch

size of 1 alongside an 80/20 train-test split. After conducting initial trials with LeNet, fine-

tuning efforts ultimately yielded the neural network architecture defined below, with max-

pooling layers and ReLU activation functions following each fully-connected layer.

25 epochs of training were conducted, with Adam optimization and a learning rate of 1e-3.

Training and validation MSE loss recorded across these epochs are shown below.

The model was validated on a subset of male faces during each epoch, with strong

performance across the board. Success and failure cases among these are shown below.

Successful predictions in the male face dataset:

Failed predictions in the male face dataset:

In the spirit of transfer learning, I also tested my final trained model on unseen data in the

form of female faces. Considering that it had never seen female faces throughout the

training process, the model managed reasonable performance. Success and failure cases

among these examples are shown below.

Reasonably successful predictions in the unseen female face dataset:

Failed predictions in the female face datset:

It’s difficult to say exactly why the model struggles with certain faces more than others.

The most likely explanation would be some combination of unfamiliar facial structure and

rotated position, both of which can throw off the model in inference if not handled during

training. To address this moving forward, I incorporated randomized image

transformations to help diversify future training inputs.

Part 2. Full Facial Keypoints Detections

I once again used inputs from the IMM Face Database here, resizing images to the

recommended 160x120 after converting them to grayscale within the -0.5 to 0.5 range.

These images were then augmented randomly, with a combination of rotations within a 30

degree range as well as brightness and saturation adjustments applied to diversify inputs.

Separate PyTorch Datasets were again created for male and female faces, respectively.

Some images selected from each dataset can been seen below along with the full set of

associated training labels (facial keypoints, in this case) shown in light blue.

Randomly selected female faces & associated facial landmarks:

Randomly selected male faces & associated facial landmarks:

A PyTorch Dataloader with all male faces and associated labels was again used for training,

with a batch size of 1 and an 80/20 train-test split. I built upon the architecture from the

nose tip detection task to maximize performance, yielding the neural network architecture

defined below. Like before, max-pooling layers and ReLU activation are both also present.

25 epochs of training were conducted, with Adam optimization and a learning rate of 1e-3.

Training and validation MSE loss recorded across these epochs are shown below.

Performance is reasonably strong across the male dataset once again, particularly

considering the complexity of the task. Performance challenges can likely be attributed to

unfamiliar facial features, which can be addressed by a deeper network capable of learning

additional complexities. More training epochs would also ensure the model truly converges.

Successful predictions in the male face dataset:

Failed predictions in the male face dataset:

The learned filters for each convolutional layer can be visualized sequentially below.

Learned filters for the first convolutional layer:

Learned filters for the second convolutional layer:

Learned filters for the third convolutional layer:

Learned filters for the fourth convolutional layer:

Learned filters for the fifth convolutional layer:

Part 3. Train With Larger Dataset

For this part, I used inputs from ibug’s Faces In The Wild dataset, resizing cropped faces to

the recommended 224x224 after converting them to grayscale images in the -0.5 to 0.5

range. Data augmentation was also applied, with the same combination of random

rotations within a range and brightness & saturation adjustments used to diversify inputs.

These images were loaded into a single custom PyTorch Dataset for training. Select images

can be seen below, with their associated labels (full facial keypoints) shown in light blue.

Randomly selected faces & associated facial landmarks:

This augmented dataset was loaded into a PyTorch Dataloader, with a batch size of 1 but

no train-test split since a separate test set was already provided. A pre-trained ResNet18

model was fine-tuned for this task, with default values adjusted as shown below to both

accommodate the input data properly as well as optimize performance.

25 epochs of training were conducted, with Adam optimization and a learning rate of 1e-4.

Mean squared error loss as recorded across these training epochs is shown below.

Per the Kaggle results below, my model’s predictions achieved an MSE loss of 7.48595

across the test set — not quite good enough for extra credit, but pretty strong nonetheless!

These strong performance metrics are backed up by the results for randomly selected

inputs from the test set shown below, all of which yielded fairly accurate predictions.

Predicting keypoints for portraits I used in the previous project yielded more strong results,

with generally accurate predictions across all three images. The model struggled slightly

with Rihanna, unable to accurately outline her cheek area as accurately as other inputs.

Similar issues were encountered with Stephen Curry, though to a lesser extent.