Project 4: Classification and Segmentation

Chelsea Ye, cs194-26-agb


Part 1: Image Classification

Implementation

This part solves image classification in the Fasion MNIST dataset using Deep Nets. My Convolutional Neural Network has 2 convolutional layer (torch.nn.Conv2d), where each conv layer will be followed by a ReLU(torch.nn.ReLU) followed by a maxpool(torch.nn.MaxPool2d), then followed by 2 fully connected networks(torch.nn.Linear).

Accuracy Curve of Train and Validation Dataset
Per Class Accuracy of Validation and Test Dataset
Class Label Accuracy on Validation Dataset(%) Accuracy on Test Dataset (%)
T-shirt/top 84.6 84.7
Trouser 99.2 98.8
Pullover 85.0 85.1
Dress 93.3 93.6
Coat 84.9 82.1
Sandal 98.3 98.3
Shirt 80.5 77.0
Sneaker 98.4 98.0
Bag 97.0 97.3
Ankle boot 96.0 94.6

Shirt is the hardest to classify, with an accuracy of 80.5% on Validation dataset and 77.0% on Test dataset. My classifier achieves greatest accuracy in Trouser class with a 99.2% accuracy on Validation dataset and 98.8% on Test dataset.

Example Imagesof Correct/Incorrect Classification
Class Label Correct Classification Incorrect Classification
T-shirt/top    
Classified as Shirt(L/R)
Trouser    
Classified as Dress(L)/Coat(R)
Pullover    
Classified as Shirt(L)/Dress(R)
Dress    
Classified as Shirt(L)/Coat(R)
Coat    
Classified as Shirt(L)/Pullover(R)
Sandal    
Classified as Sneaker(L)/Ankle Boot(R)
Shirt    
Classified as T-Shirt(L)/Pullover(R)
Sneaker    
Classified as Ankle Boot(L/R)
Bag    
Classified as T-Shirt(L/R)
Ankle Boot    
Classified as Sandle(L)/Sneaker(R)
Visualization of the Convolutional Filters

32 filters of the first convolutional layer

Part 2: Semantic Segmentation

Overview

Semantic Segmentation refers to labeling each pixel in the image to its correct object class. This part solves semantic segmentation of the Mini Facade dataset by training a network that converts images of different cities and architectural styles to the labels in 5 classes: balcony, window, pillar, facade, and others.

Hyperparameters of my CNN model
Layer Input Channel Output Channel kernel padding stride
Conv2d 3 64 3 1 -
Conv2d 64 128 3 1 -
MaxPool2d 128 128 2 - 2
Conv2d 128 256 3 1 -
Conv2d 3 256 512 1 -
MaxPool2d 512 512 2 - 2
Conv2d 512 512 1 0 -
Conv2d 512 5 1 0 -
ConvTranspose2d 5 5 8 2 4
Loss curve of Train and Validation Dataset
Average Precision on Test Set
Class Label Average Precision
Balcony 0.626
Window 0.699
Pillar 0.101
Facade 0.794
Others 0.360
Averaged 0.516
Example Segmentation on My Photo Collection

My Photo taken in Mexico City

Semantic Segmentation Result

The result shows great precision in windows, facade and others, but less ideal in pillars and balcony.