CS 194-26: FACE KEYPOINT DETECTION

Amol Pant

PART 1: Nose Tip Detection

First I setup the data. Here are some random samples of grayscaled and downsampled images
Image
Bruh
Bruh
Bruh
Bruh
Bruh
Bruh

I then setup my neural net, which looks like this:
Bruh
I then find the best learning rate over the range [0.0001. 0.001] with an interval of 0.0001 by training over only 10 epochs and measuring losses for each learning rate. Here is the result of that. I find that the best learning rate is somewhere between 0.0007 and 0.0009.
Bruh
Next, I train my net over 25 epochs and segment the data into training and validation. For each epoch I also collect results for the validation data. Here is the resulting graph. Bruh
After training, here are some good and bad results. The bad results are generally on sideways tilted people and are a result of some minor overfitting problems with my design.
Data augmentation would have improved the results further maybe.
Image with prediction and ground truth

PART 2

Following the same steps as PART1, I also add some image augmentation functions such as random flipping, rotating, and color jittering. I also rescale to 160, 120 instead of 80, 60. Here are some of the images after augmentation.
Image
Bruh
Bruh
Bruh
Bruh
Bruh
Bruh

I now setup my net which looks like this:
Bruh
Following the same training process of part 1, I instead train for 50 epochs. Here is the resulting training/validation loss graph: Bruh
Here are some of the results of the graph on faces. They look bad due to massive overfitting resulting in all of the results looking the same.
Image
Bruh
Bruh
Bruh
Bruh
Bruh
Bruh

Here are some of the visualized filters of conv1 and conv2 layers (rest of the layers are exponentially larger so hard to display).
Bruh
Bruh