CS194-26: Project 4 - Classification and Segmentation

Leanna Yu (cs194-26-aff)


In this project, we built Convolutional Neural Networks (CNNs) for image processing: classification and segmentation. For the classification task, we classified articles of clothing in the Fashion-MNIST dataset. For the segmentation task, we semantically segmented images of mini Facade dataset using Deep Nets. For the implementation, I used PyTorch and ran my code on Google Colab.



Part 1: Image Classification

Fashion-MNIST

The CNN consisted of two convolutional layers, with 32 channels each, and a kernel size of 3. Following each convolutional layer by a RELU and maxpooling layer of size 2 and stride 2. The CNN ends with two fully connected linear layers, and an output layer. The CNN used cross-entropy loss and training with Adam at a rate of 0.01.

The batch size was set to 100, and epoch number set to 5.

Analysis of Different Class Categorizations

Category Train Data Validation Data  
T-shirt/Top 80.00% 75.80%  
Trouser 98.73% 97.90%  
Pullover 89.95% 88.10%  
Dress 94.78% 92.50%  
Coat 78.48% 75.60%  
Sandal 97.70% 97.00%  
Shirt 74.60% 69.50%  
Sneaker 98.22% 97.50%  
Bag 94.22% 93.70%  
Ankle Boot 93.83% 93.70%

The CNN performed the worst on classifying shirts, with an accuracy of 74.60% accuracy on the train dataset, and only 69.50% accuracy on the validation dataset. Trouser classification, on the other hand, did quite well with the CNN, achieving 98.73% on the train dataset, and 97.90% on the validation dataset.



Below is a graph showing CNN accuracy over time, measured in epochs in this case, for the validation and training datasets:

Accuracy Graph

Below are the 32 learned filters of the first convolutional layer of the CNN:

1/4 Layer 1 Learned Filters

2/4 Layer 1 Learned Filters

3/4 Layer 1 Learned Filters

4/4 Layer 1 Learned Filters

Below are the 32 learned filters of the second convolutional layer of the CNN:

1/4 Layer 2 Learned Filters

2/4 Layer 2 Learned Filters

3/4 Layer 2 Learned Filters

4/4 Layer 2 Learned Filters

The final CNN for the Fashion-MNIST dataset had an accuracy of 90%.

Part 2: Semantic Segmentation

My architecture for the semantic segmentation portion of this project is as follows:

Architecture



Pictured is the loss graph before training (on top), and after training (on bottom):

Loss Graph Before

Loss Graph After

The average precision values, ranging from 0 to 1, are depicted below in order of by class: [others, facade, pillar, window, balcony]. The average precision after training was 0.3. This number could have been improved had I trained for longer, with more epochs.

Average Precision

Below is a sample result for a random input image (as a result of shelter in place) of an office building:

Building

After Training