Project 4: Classification and Segmentation¶

Overview¶

In this project I was tasked with creating 2 Deep Nets. In part 1, I created and trained a convolutional neural network to classify various articles of clothing from the FashionMNIST dataset. In part 2, I created and trained a convolutional neural network for the semantic segmentation of the mini-Facade dataset.

Part 1: Image Classification¶

Part 1 can be compartmentalized into 2 components

Dataloading
CNN Architecture, Training, and Performance
Network Filter Visualization

FashionMNIST Dataset¶

As stated in the overview, I used the FashionMNIST dataset. Here are the label and descriptions of the dataset.

Label	Description
0	T-shirt/Top
1	Trouser
2	Pullover
3	Dress
4	Coat
5	Sandal
6	Shirt
7	Sneaker
8	Bag
9	Ankle Boot

There are a total of 10 classes with each image in the dataset being of dimension 28 x 28. Here is a nice view of some of the images in the dataset. Every three rows of the picture below represents one class.

Here is a sample of four images and their respective labels

CNN Architecture and Performance¶

My CNN architecture was as follows

Conv layer: 32 channels with 2x2 kernel
ReLU
MaxPool: 2x2 window
Conv layer: 32 channels with 5x5 kernel
ReLU
MaxPool: 2x2 window
Fully-Connected Layer: 120 dimension output
Fully-Connected Layer: 10 dimension output

For my loss function I used Cross Entropy. I used Adam with a learning rate of 0.01 and 0 weight-decay momentum. For training, validation, and testing I used a batchsize of 4. I trained for a total of 5 epochs. As stated in the spec, I at first attempted to use Adam as my optimizer but I obtained much worse results in comparison to the results that I received when using SGD.

Here are the training and validation loss curves of my network

Here is the final per-class accuracy of my net

Accuracy of 0 : 87 %
Accuracy of 1 : 97 %
Accuracy of 2 : 86 %
Accuracy of 3 : 91 %
Accuracy of 4 : 78 %
Accuracy of 5 : 98 %
Accuracy of 6 : 74 %
Accuracy of 7 : 97 %
Accuracy of 8 : 97 %
Accuracy of 9 : 95 %

The overall accuracy of my classifier was 90.44% on the test set. The hardest pictures for my net to classify are t-shirts, pullovers, coats, shirts. To be completely honest, even I had some difficulty in differentiating between these articles of clothing!

For each class, here are two photos that my classifier labeled correctly and two photos that my classifier labeled incorrectly

Filter Visualization¶

Here are what the filters look like from the first convolutional layer

Here are what the filters look like from the second convolutional layer

Part 2: Image Segmentation¶

Task¶

Create and train an CNN to segment a building into 5 different classes

Class	Color	Pixel Value
Facade	Blue	1
Pillar	Green	2
Window	Orange	3
Balcony	Red	4

Here is an example of an image and its segmented form

CNN Architecture¶

For this part of the project, I designed my CNN using the U-net architecture proposed here: https://arxiv.org/abs/1505.04597

The U-Net is composed of 2 components. In the contraction portion of the net, we reduce the dimensions of the image. In the expansion phase of the net, we increase the dimensions of the image.

Contraction
- Conv Layer: 32 channel out, 3x3 kernel, padding=1
- ReLU
- Batch Norm
- Conv Later: 64 channel out, 3x3 kernel, padding=2
- ReLU
- Batch Norm
- MaxPool: 2x2 window
- Conv Layer: 128 channel out, 3x3 kernel
- ReLU
- Batch Norm
Expansion
- Conv Transpose: 64 channel out, 2x2 kernel, stride=2, padding=1
- Conv Layer: 32 channel out, 3x3 kernel, padding=2
- ReLU
- Batch Norm
- Conv Layer: 5 channel out, 3x3 kernel, padding=2,
- ReLU
- Batch Norm

My CNN doubles the number of channels between each convolutional layer to increase the number of features that my net can perceive. In this, I go from $\text{Image Start: }3 \rightarrow 32 \rightarrow 64 \rightarrow 128 \rightarrow 32 \rightarrow \text{Num Classes: } 5$. I use the ReLU as my nonlinearity. I use batch normalization after each conv-relu pass in order to reduce overfitting (it increases the independence between netowrk layers) and expedite the training process. I add padding as needed to ensure that image dimensions align as the image is passed through the net.

CNN Performance¶

Here are the training and validation loss curves

Here are the AP scores of my CNN

AP = 0.5956991094463625
AP = 0.7540532301608894
AP = 0.1280414026312353
AP = 0.7732066608260174
AP = 0.3552330444870263
Average AP: 0.5212466895103062