CS 194-26 Project 4: Classification and Segmentation

By: Aivant Goyal

In this problem, we will solve classification of images in the Fashion-MNIST dataset and semantic segmentation of images in mini Facade dataset using Deep Nets! The project was implemented using PyTorch and run on Google Colab.

Part 1: Image Classification

In this part, we use the Fashion-MNIST dataset and create a convolutional Neural Network to classify images of fashion items.

Dataloader

Loading some examples, we can see the original images and the associated labels:

CNN

The structure of our CNN is below:

I ran the NN over 20 epochs with a batch size of 50 and a LR of 0.001 and a weight_deca of 0.00001.

Accuracy

Below is the graph for the training accuracy:

At the end of 20 epochs, the validation accuracy was 90% overall.

The class-by-class accuracy with examples is below. For most classes, the incorrect samples are indeed confusing, and might even be difficult for a human to differentiate

Class Accuracy Correct Samples Incorrect Samples
T-Shirt/Top 85.6%
Trouser 97.9%
Pullover 86.2%
Dress 89.3%
Coat 85.6%
Sandal 97.8%
Shirt 74%
Sneaker 96.7%
Bag 97.8%
Ankle Boot 96.6%

Visualizing the Layers

Visualizing the layers, we see the following:

Part 2: Semantic Segmentation

In this part of the project, we are going to label pixels of an image of a building into 5 classes: facade, pillar, window, balcony, and other

There were 906 training images and 115 test images. I did a roughly 90/10 split of the training data to produce a validation set.

Model

After some hyperparameter tuning, I settled on the following parameters:

Batch Size: 10
Epochs: 20
Learning Rate: 0.001
Weight Decay: 0.00005

The structure of the model was as shown below. It consisted of 6 convolutional layers, each followed by a ReLU, a Batch Norm, and alternating between maxpooling and upsampling.

The sizes of the layers were simply powers of 2 that represented a widening and then narrowing of the network.

Results

After Training over 20 Epochs, I achieved a 78% accuracy on the test set. Below are the graphs for the training loss and validation loss over the epochs.

Training Loss Validation Loss

The loss achieved in training was lower than in testing, but this is expected.

The AP’s for the 5 classes were below:

with the order being “others, facade, pillar, window, balcony”

Examples

Below are some examples from the test set. We can see that the model often overestimated pillar, which explains why the AP for that class is so low

Original Result Correct