CS194-26 Project 4: Classification and Segmentation

Overview

This project had two parts: the first is classifying the MNIST fashion dataset, while the second is doing semantic segmentation of building faces.

Part 1: Fashion MNIST

After training the CNN, I got the following accuracies for each epoch:

Here are the per-class accuracies of the final model:

Evidently, class 6 (shirt) is the most difficult class, followed by class 4 (coat).

For the below, the left is an example of a correct classification, and the right is an example of an incorrect classification.

Class 0

Class 1

Class 2

Class 3

Class 4

Class 5

Class 6

Class 7

Class 8

Class 9

Here is a visualization of the first layer of filters:

Part 2: Semantic Segmentation

For this network, I tried a few different learning rates and loss functions before settling on cross-entropy loss, an ADAM optimizer with learning rate 0.005, and 6 convolutional layers with 6, 12, 24, 48, 64, and 5 output channels. This gave me the following loss while training 10 epochs:

I got an average AP of 0.45215217958099043

Here is an example of a segmented image:

The segmentation is clearly fuzzier, and generally seems to be poor at drawing boundaries while still preserving the general shape of the building facade.