CS 194-26 Project 4: Classification and Segmentation

Part 1: Image Classification

We will use the Fasion MNIST dataset available in torchvision.datasets.FashionMNIST for training our model. Fashion MNIST has 10 classes and 60000 train + validation images and 10000 test images.

Here are a few images and their corresponding classes

2 - Pullover | 9 - Ankle Boot | 1 - Trouser | 5 - Sandall | 1 - Trouser | 2 - Ankle Boot | 0 - T-shirt/top | 3 - Dress

Train and validation accuracy during the training process.

As you can see, my final average validation accuracy was 91%. Ultimately, I used the criterion of Cross Entropy Loss and used the Adam Optimizer with a learning rate of 0.0005 and a weight decay of 0.00001. I left the number of channels as 32, and used a LeakyRelu non-linearity.

Per class accuracy of my classifier on the validation and test dataset.

The classes which were hardest to classify are 0, 2, 4, and 6 which correspond to T-shirt/Top, Pullover, Coat, and Shirt. This is likely because all four of these categories are similar in shape and are mistaken for one another.

2 images from each class which the network classifies correctly, and 2 more images where it classifies incorrectly.

Visualize the learned filters.

I visualized the first convolutional layer with the help of this tutorial and this colab notebook

Part 2: Semantic Segmentation

Semantic Segmentation refers to labeling each pixel in the image to its correct object class. We will use the Mini Facade dataset which contains natural jpg images of buildings and labeled png images corresponding to each building.

Model Architecture

I ran the model for 30 epochs with the criterion of Cross Entropy Loss and used the Adam Optimizer with a learning rate of 1e-3 and a weight decay of 2e-5. I split the 'train' data into (0-800] for training and (800, 906) for validation. The specific layers and parameters are shown below.

Model Architecture

Loss and Average precision on the test set

plot showing both training and validation loss across iterations

I ran the trained model on a building from my collection. Which parts of the images does it get right? Which ones does it fail on?

The model misclassifies the windows as balconies for the most part - probably because there are so many panes per window, which makes it look more like a balcony. There is still some orange around the majority red windows, which indicates that the model is considering it may be a window indeed. It does not correctly classify the facade.

Here is an example from the test set that looks a bit better, perhaps because it is simpler. The windows are correctly classified and so is the facade, with only a bit of noise above in the 'other' section.