Jimmy Xu
The goal of the project is to train a neural network to detect facial keypoints (also called face landmarks) in two datasets.
In this part, I trained a neural network to detect the nose tip in the IMM Face Database.
The dataset has 240 images of 40 persons. Each image has 58 facial keypoints. The project specs require me to use the first 32 persons as the training set and the last 8 as the validation set. Here are some sample pictures with the nose tip annotation visualized:
For preprocessing, I converted every image to grayscale, scaled every image to (80x60), and normalized the pixel values to be [-0.5, 0.5]. I also applied random rotation and random color jittering. Here are some examples of doing preprocessing on the same image, with nose tip annotated.
The following is the architecture of my neural network.
x
NoseTipNet(
(features): Sequential(
(0): Conv2d(1, 12, kernel_size=(5, 5), stride=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(12, 24, kernel_size=(3, 3), stride=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(24, 32, kernel_size=(3, 3), stride=(1, 1))
(6): ReLU(inplace=True)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Linear(in_features=6528, out_features=200, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=200, out_features=2, bias=True)
)
)
I used mean squared error (MSE) as the loss function and Adam as the optimizer with a learning rate of 0.0001. I trained for 15 epochs with a training batch size of 32.
With the above configurations, I was able to achieve a loss of 0.0056 on the validation set. Here's the training loss and validation loss across 15 epochs.
I tried increasing the capacity of the network by changing the number of layers as well as varying the filter size. And they didn't significantly change the loss. Here are the results:
By adding two extra layers ((4) and (5) below), I was able to achieve a validation loss of 0.0057, which is pretty similar to the net without the extra layer.
xxxxxxxxxx
NoseTipNet(
(features): Sequential(
(0): Conv2d(1, 12, kernel_size=(5, 5), stride=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(12, 24, kernel_size=(3, 3), stride=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU(inplace=True)
(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(7): Conv2d(24, 32, kernel_size=(3, 3), stride=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Linear(in_features=6528, out_features=200, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=200, out_features=2, bias=True)
)
)
By changing every filter size from (3, 3) to (7, 7), I was able to achieve a validation loss of 0.0068, which is slightly higher than previous results.
xNoseTipNet(
(features): Sequential(
(0): Conv2d(1, 12, kernel_size=(7, 7), stride=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(12, 24, kernel_size=(7, 7), stride=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(24, 32, kernel_size=(7, 7), stride=(1, 1))
(6): ReLU(inplace=True)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Linear(in_features=4032, out_features=200, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=200, out_features=2, bias=True)
)
)
Here are two examples that the network performs well on. The green dot is the ground truth while the red dot is the prediction.
Here are two examples that the network performs not so well on
I think there are many possible reasons why my network fail on these examples, and the most important one is the scarcity of images. Despite adding data augmentation, there are simply not enough images to train on, especially sideviews. This explains why the network performs ok on most of the frontal images while not so well on side views.
In this part, I trained a neural network to detect the 58 facial keypoints in the IMM Face Database.
The dataset has 240 images of 40 persons. Each image has 58 facial keypoints. The project specs require me to use the first 32 persons as the training set and the last 8 as the validation set. Here are some sample pictures with all the facial keypoints visualized:
For preprocessing, I scaled every image to (240x180) and normalize the pixel values to be [-0.5, 0.5]. In this part, I also implemented various data augmentation techniques (and applied them to the previous part retrospectively): color jittering (brightness, contrast, saturation, and hue), random rotation, random cropping, and center cropping (not used).
The following is the architecture of my neural network.
xxxxxxxxxx
FacialKeypointsNet(
(features): Sequential(
(0): Conv2d(1, 15, kernel_size=(3, 3), stride=(1, 1))
(1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(2): ReLU(inplace=True)
(3): Conv2d(15, 30, kernel_size=(3, 3), stride=(1, 1))
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): ReLU(inplace=True)
(6): Conv2d(30, 25, kernel_size=(3, 3), stride=(1, 1))
(7): ReLU(inplace=True)
(8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(9): Conv2d(25, 20, kernel_size=(3, 3), stride=(1, 1))
(10): ReLU(inplace=True)
(11): Conv2d(20, 15, kernel_size=(3, 3), stride=(1, 1))
(12): ReLU(inplace=True)
)
(classifier): Sequential(
(0): Linear(in_features=5760, out_features=200, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=200, out_features=116, bias=True)
)
)
I used mean squared error (MSE) as the loss function and Adam as the optimizer. The learning rate is 0.001 with a weight decay of 0.00001. The training batch size is 32.
With the above configurations, I was able to achieve a loss of 0.0035 on the validation set. Here's the training loss and validation loss across 30 epochs.
I tried increasing the capacity of the network by changing the number of layers as well as varying the filter size. And they didn't significantly change the loss.
By adding two extra layers ((8) and (9) below), I was able to achieve a validation loss of 0.0041, which is pretty similar to the net without the extra layer.
xFacialKeypointsNet(
(features): Sequential(
(0): Conv2d(1, 15, kernel_size=(3, 3), stride=(1, 1))
(1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(2): ReLU(inplace=True)
(3): Conv2d(15, 30, kernel_size=(3, 3), stride=(1, 1))
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): ReLU(inplace=True)
(6): Conv2d(30, 25, kernel_size=(3, 3), stride=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(25, 25, kernel_size=(3, 3), stride=(1, 1))
(9): ReLU(inplace=True)
(10): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(11): Conv2d(25, 20, kernel_size=(3, 3), stride=(1, 1))
(12): ReLU(inplace=True)
(13): Conv2d(20, 15, kernel_size=(3, 3), stride=(1, 1))
(14): ReLU(inplace=True)
)
(classifier): Sequential(
(0): Linear(in_features=5175, out_features=200, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=200, out_features=116, bias=True)
)
)
By changing every filter size from (3, 3) to (7, 7) or (5, 5), I was able to achieve a validation loss of 0.0041, which is pretty similar to the net with 3x3 filters.
xFacialKeypointsNet(
(features): Sequential(
(0): Conv2d(1, 15, kernel_size=(7, 7), stride=(1, 1))
(1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(2): ReLU(inplace=True)
(3): Conv2d(15, 30, kernel_size=(5, 5), stride=(1, 1))
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): ReLU(inplace=True)
(6): Conv2d(30, 25, kernel_size=(5, 5), stride=(1, 1))
(7): ReLU(inplace=True)
(8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(9): Conv2d(25, 20, kernel_size=(5, 5), stride=(1, 1))
(10): ReLU(inplace=True)
(11): Conv2d(20, 15, kernel_size=(5, 5), stride=(1, 1))
(12): ReLU(inplace=True)
)
(classifier): Sequential(
(0): Linear(in_features=2700, out_features=200, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=200, out_features=116, bias=True)
)
)
I think the reason why it performs poorly on some images is pretty much the same as part 1. Despite adding data augmentation, there are simply not enough images to train on, especially sideviews.
Here are the filters of the first conv layer of my network. I don't really know what they represent...
In this part, I trained a neural network to detect the 68 face landmarks in the iBUG Face dataset.
The dataset has 6666 images. Each image has 68 facial keypoints as well as a bounding box of the face. I used the same data augmentation as the previous part. In addition, I used the provided bounding box to crop out the region containing the face before passing each image to be preprocessed. They are then resized to (224, 224) to fit the pretrained model.
I used a pre-trained (on ImageNet) ResNet-50 and modified the first layer (conv1) so that it can take in a grayscale image, and the last layer so that it can output 68 points. Here's the architecture:
x
IBugFaceNet(
(model): ResNet(
(conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=136, bias=True)
)
)
I used mean squared error (MSE) as the loss function and Adam as the optimizer. The learning rate is 0.0001 with a weight decay of 0.00001. The training batch size is 64 and the validation batch size is 8. I used 6000 images as the training set and 666 images as the validation set.
I trained the net for 50 epochs. With the above configurations, I was able to achieve a mean absolute error of 8.18925 on the test set. Here's the training loss and validation loss.
Here are some results from the test set
Here are some results from images I gathered from previous projects or the Internet.
One thing I noticed is that I have to define a bounding box of the face for the neural net to work. Otherwise, the landmarks will be all over the place.
As you can see, the network performs fairly well on frontal images. However, for an animated character, it doesn't perform very well: the eyes and eyebrows are off. This is understandable as animated characters have rather different facial features from real people. I was actually surprised that it could estimate the rough positions of the facial features of this character.
To truly automate the face morphing process, we need a face detector (to generate the box), in addition to a face landmarks detector. I used dlib library to automate the process. Here's the result.