CS 194-26: FACE KEYPOINT DETECTION
Amol Pant
PART 1: Nose Tip Detection
First I setup the data. Here are some random samples of grayscaled and downsampled images
I then setup my neural net, which looks like this:
I then find the best learning rate over the range [0.0001. 0.001] with an interval of 0.0001 by training over only 10 epochs and measuring losses for each learning rate.
Here is the result of that. I find that the best learning rate is somewhere between 0.0007 and 0.0009.
Next, I train my net over 25 epochs and segment the data into training and validation. For each epoch I also collect results for the validation data.
Here is the resulting graph.
After training, here are some good and bad results. The bad results are generally on sideways tilted people and are a result of some minor overfitting problems with my design.
Data augmentation would have improved the results further maybe.
Image with prediction and ground truth |
|
|
|
|
|
PART 2
Following the same steps as PART1, I also add some image augmentation functions such as random flipping, rotating, and color jittering. I also rescale to 160, 120 instead of 80, 60.
Here are some of the images after augmentation.
I now setup my net which looks like this:
Following the same training process of part 1, I instead train for 50 epochs. Here is the resulting training/validation loss graph:
Here are some of the results of the graph on faces. They look bad due to massive overfitting resulting in all of the results looking the same.
Here are some of the visualized filters of conv1 and conv2 layers (rest of the layers are exponentially larger so hard to display).
|
|