Project 4

Project 4: Classification and Segmentation

Chelsea Ye, cs194-26-agb

Part 1: Image Classification

Implementation

This part solves image classification in the Fasion MNIST dataset using Deep Nets. My Convolutional Neural Network has 2 convolutional layer (torch.nn.Conv2d), where each conv layer will be followed by a ReLU(torch.nn.ReLU) followed by a maxpool(torch.nn.MaxPool2d), then followed by 2 fully connected networks(torch.nn.Linear).

Accuracy Curve of Train and Validation Dataset

Per Class Accuracy of Validation and Test Dataset

Class Label	Accuracy on Validation Dataset(%)	Accuracy on Test Dataset (%)
T-shirt/top	84.6	84.7
Trouser	99.2	98.8
Pullover	85.0	85.1
Dress	93.3	93.6
Coat	84.9	82.1
Sandal	98.3	98.3
Shirt	80.5	77.0
Sneaker	98.4	98.0
Bag	97.0	97.3
Ankle boot	96.0	94.6

Shirt is the hardest to classify, with an accuracy of 80.5% on Validation dataset and 77.0% on Test dataset. My classifier achieves greatest accuracy in Trouser class with a 99.2% accuracy on Validation dataset and 98.8% on Test dataset.

Example Imagesof Correct/Incorrect Classification

Class Label	Correct Classification	Incorrect Classification
T-shirt/top		Classified as Shirt(L/R)
Trouser		Classified as Dress(L)/Coat(R)
Pullover		Classified as Shirt(L)/Dress(R)
Dress		Classified as Shirt(L)/Coat(R)
Coat		Classified as Shirt(L)/Pullover(R)
Sandal		Classified as Sneaker(L)/Ankle Boot(R)
Shirt		Classified as T-Shirt(L)/Pullover(R)
Sneaker		Classified as Ankle Boot(L/R)
Bag		Classified as T-Shirt(L/R)
Ankle Boot		Classified as Sandle(L)/Sneaker(R)

Visualization of the Convolutional Filters

32 filters of the first convolutional layer

Part 2: Semantic Segmentation

Overview

Semantic Segmentation refers to labeling each pixel in the image to its correct object class. This part solves semantic segmentation of the Mini Facade dataset by training a network that converts images of different cities and architectural styles to the labels in 5 classes: balcony, window, pillar, facade, and others.

Hyperparameters of my CNN model

Layer	Input Channel	Output Channel	kernel	padding	stride
Conv2d	3	64	3	1	-
Conv2d	64	128	3	1	-
MaxPool2d	128	128	2	-	2
Conv2d	128	256	3	1	-
Conv2d	3	256	512	1	-
MaxPool2d	512	512	2	-	2
Conv2d	512	512	1	0	-
Conv2d	512	5	1	0	-
ConvTranspose2d	5	5	8	2	4

Loss curve of Train and Validation Dataset

Average Precision on Test Set

Class Label	Average Precision
Balcony	0.626
Window	0.699
Pillar	0.101
Facade	0.794
Others	0.360
Averaged	0.516

Example Segmentation on My Photo Collection

My Photo taken in Mexico City

Semantic Segmentation Result

The result shows great precision in windows, facade and others, but less ideal in pillars and balcony.