This project had two parts: the first is classifying the MNIST fashion dataset, while the second is doing semantic segmentation of building faces.
After training the CNN, I got the following accuracies for each epoch:
Here are the per-class accuracies of the final model:
Evidently, class 6 (shirt) is the most difficult class, followed by class 4 (coat).
For the below, the left is an example of a correct classification, and the right is an example of an incorrect classification.
Class 0
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
Class 7
Class 8
Class 9
Here is a visualization of the first layer of filters:
For this network, I tried a few different learning rates and loss functions before settling on cross-entropy loss, an ADAM optimizer with learning rate 0.005, and 6 convolutional layers with 6, 12, 24, 48, 64, and 5 output channels. This gave me the following loss while training 10 epochs:
I got an average AP of 0.45215217958099043
Here is an example of a segmented image:
The segmentation is clearly fuzzier, and generally seems to be poor at drawing boundaries while still preserving the general shape of the building facade.