Setup¶

You will need to make a copy of this notebook in your Google Drive before you can edit the homework files. You can do so with File → Save a copy in Drive.

Mounted at /content/gdrive

Question 2¶

Testing transformations¶

Dataloader testing¶

Convnet 2:¶

We see that the keypoint detection performs quite well on training data but it's not perfect. It does however get the general vicinity of the keypoints (jawlines, eyes, noses etc.) which is pretty impressive for such a limited dataset.

I opted for a high fanout in the beginning that then converges. The first layer expands into 128 channels, then there are channels of sizes 64, 32, 16 before reaching a fully hidden layer of size 1000 then outputting to the final 116 dimensional output vector. Experimenting with strides turned out to hurt results and I had a max pooling after each convolutional layer.

I opted for a really small size learning rate (1e-5) and it turned out quite well. Higher ones were providing noisy training I assume as the outputs from the dataloader were not that good. A batch size of 32 was good for me (I trained this set on colab, locally I could never).

We see the training and validation errors go down almost equivalently over time and the neural net indeed converges!

The two that perform poorly are the same two as before unsurprisingly. I take it that the baldness of the leftmost fellow and the highly detailed wild hair of the right one throw the net for a loop. It must think the leftmost guys's face extends into the top of his head and our network is detecting too much detail at the top of the thick haired guy.

These two also didn't appear in the training data and examining the training data nobody really looks like either of them so it makes sense.

We see that the far left cell is the same as a vertical edge detector and the next one is a horizontal edge detector. We verify this by convolving with the image.

The third cell is a blurring filter as we can see. The far right cell seems to make locations bright where there is a lot of detail in the face. The other filters I examined seemed to be mixtures and combinations of the above ones.

Problem 3¶

Dataloader sampling random elements (augmented)¶

We see that our image augmentation techniques worked quite well (rotation, and cropping preserves keypoints)

Downloading: "https://github.com/pytorch/vision/archive/v0.6.0.zip" to /root/.cache/torch/hub/v0.6.0.zip
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth

Kaggle

The mean absolute error for my points was 27.1 which is pretty good meaning each keypoint was about .5 pixels away from its true value on the testing set. Resnet 50 is powerful!

I utilized resnet 50 (I trained with a colab pro gpu I had for another course) for about 2 hours with learning rate 1e-4 and only changing input channels to 1 and the output dimenssion to 136. Resnet 50 is the resnet architecture with 50 convolutions instead of 18.

For some images it does quite well but for others it doesn't (I noticed that children do not do so well as the network tends towards predicting longer faces). Also when the increased size of the bounding box introduces someone else's face the algorithm struggles. We see that occlusions of the face also cause problems

It does pretty bad on Guy Fieri's face here because of the strange facial expression and glasses he's wearing which throws our algorithm for a loop. With plausible facial expressions (I took the third one from one of my favorite memes), it does quite well!

On Kobe and my friend however it did very well!