CS 194-26: FACE KEYPOINT DETECTION

Amol Pant

PART 1: Nose Tip Detection

First I setup the data. Here are some random samples of grayscaled and downsampled images

Image

I then setup my neural net, which looks like this:
Bruh

I then find the best learning rate over the range [0.0001. 0.001] with an interval of 0.0001 by training over only 10 epochs and measuring losses for each learning rate. Here is the result of that. I find that the best learning rate is somewhere between 0.0007 and 0.0009.
Bruh

Next, I train my net over 25 epochs and segment the data into training and validation. For each epoch I also collect results for the validation data. Here is the resulting graph. Bruh

After training, here are some good and bad results. The bad results are generally on sideways tilted people and are a result of some minor overfitting problems with my design.
Data augmentation would have improved the results further maybe.

Image with prediction and ground truth

PART 2

Following the same steps as PART1, I also add some image augmentation functions such as random flipping, rotating, and color jittering. I also rescale to 160, 120 instead of 80, 60. Here are some of the images after augmentation.

Image

I now setup my net which looks like this:
Bruh

Following the same training process of part 1, I instead train for 50 epochs. Here is the resulting training/validation loss graph: Bruh

Here are some of the results of the graph on faces. They look bad due to massive overfitting resulting in all of the results looking the same.

Image

Here are some of the visualized filters of conv1 and conv2 layers (rest of the layers are exponentially larger so hard to display).
Bruh