CS 194: Computer Vision, Fall 22

Project 5: Face Key Points

Aidan Meyer



Overview

Take images, use pytorch to make CNN, Use points and create network to derive facial points

Noses

For this Using the IMM Face Database First I created a Dataset, this set contains images and corresponding landmarks This was used alongside a transformation for the images to all fit the same data augmentation to create two data loaders: One training set and one validation set.

dataloader example
dataloader example

For the CNN I tried two different architectures one I refer to as a funnel because the convolutions go from large to small. The other I refer to as a fan because the convolutions go from small to large. In the end the funnel approach seemed to win out with better results. for both the input is a 1 channel black and white image.

1st CNN Architecture "funnel"
Type Channels Kernel Size
maxpool,relu,conv2d 12 7
maxpool,relu,conv2d 24 5
maxpool,relu,conv2d 32 3
linear,relu 150 N/A
linear 2(nosepoint) N/A
correct
incorrect
correct
incorrect
2nd CNN Architecture "fan"
Type Channels Kernel Size
maxpool,relu,conv2d 12 3
maxpool,relu,conv2d 24 5
maxpool,relu,conv2d 32 7
linear,relu 150 N/A
linear 2(nose point) N/A
Incorrect
Incorrect
Incorrect

Faces

Next I made a dataloader and CNN for all facial points provided by the face library. This consisted of 5 covolutional layers. I stuck to the funnel approach from the nose part since it proved more effective. One extra layer to help with the training was augmenting the data. I did this by random crops of the images and its points alongside random rotations.

dataloader example
dataloader example
CNN Architecture choice
Type Channels Kernel Size
maxpool,relu,conv2d 12 10
maxpool,relu,conv2d 20 7
maxpool,relu,conv2d 25 5
maxpool,relu,conv2d 30 5
maxpool,relu,conv2d 32 5
Linear,relu 500 N/A
Linear 58 * 2 (landmarks) N/A
incorrect
correct
incorrect
correct

After running the net it is interesting to see the filters that the network learned. Though seemingly nonsensical and possibly not very deeply learned given a lack of data in this set, it is still possible, to see how some of these features may be detecting edges and corners of sort.

CNN on Large Dataset using pretrained Resnet

Next using a larger databass of about 6000 images I trained a CNN to find facial keypoints. To speed up the process I used the strategy of using a pretrained CNN as described in class. For this, I used resnet18 the only changes I made to the architecture was the first convolution to take in 1 channel so I could use my images defined by the dataloader I created which crops it to the bounding box provided for the face and turns the image black and white. Other augmentation involves normalizing the points and changing the size dimensions to 224 by 224.

Data loader example with true ground values

Architecture

The Resnet18 architecture can be seen in the following diagram, the only changes I made were to the input (one channel for a black and white image), and the output to be 68 * 2, which provides two values as landmarks for every point in the face landmarks. I then trained the network for 20 epochs with a learning rate of 1-3e. The Training and validation loss came out pretty well as they both approched 0 without the overfitting effect of the validation loss climbing at the end. So I was pretty satisfied with the results after three hours of running on google colab.

Resnet18 arch
Training stats

Results on Test set

Results on my own images

In conclusion for the my model, it is clear it is not perfect as noted by examples like the one of me on my harp. It seems that face positioning still plays an important rule in facial recognition done by the network. I learned in my Cogsci Perception class that humans too view faces that are not upright by parts rather than by entirety, so it makes me wonder if the net has learned the more baseline facial recognition, but not the complexity of individual parts and their relation to one another.

Heat mapping CNN (U-Net)

for the last part we used heat maps to be train a neural network and the unet architecture to solve this classification problem

Making the Heatmap

To make a heatmap of the the faces and their landmarks I created a function that outputs a 2d gaussian over a black image of the same image size centered at a specified location. For the gaussian I chose a sigma of 10 and a kernnel of size 60 some examples can be seen below

Architecture

For this I used the MRI brain example of UNET pretrained. The only changes I needed to make was having an output of 68 to get a different heat map predictor for each point in landmarks

What I learned

After a very uphill battle with pytorch and google colab and multiple midterms, I was unable to get the UNET to work properly. However, I came into this project with very little ML experience so I am pround of myself for making it this far. My UNet implimentation can be seen in my code submission.