Jamarcus Liu

CS 194 Project 4: Classification and Segmentation

Jamarcus Liu, cs194-26-adh

Overview

In this project, we will use Convolutional Neural Network (CNN) to solve classication of images in the Fashion MNIST dataset and semantic segmentation of images in mini Facade dataset.

Part I. Image Classification

Data loading

In this part, we aim to classify images in the Fashion MNIST dataset to one of the ten categories. I used 50000 images for training, 30000 for validation, and another 30000 for testing. Here are some example pictures in the training set along with there labels

Shirt	Ankle boot	Shirt	Coat	Bag

Training and accuracy

I used Adam optimizer (learning rate = 0.01, weight decay = 0.0001) and ReLU as non-linearities. I trained with minibatches of 100, here is the accurary on training and testing data after each epoch. I trained only 5 epochs as the validation accuracy did not improve by much.

Category	Validation Accurary	Test Accuracy
T-shirt/top	83%	85%
Trouser	92%	90%
Pulllover	96%	94%
Dress	80%	79%
Coat	92%	92%
Sandal	81%	87%
Shirt	72%	71%
Sneaker	93%	94%
Bag	98%	93%
Ankle Boots	97%	92%
Average	88%	87%

Example filters

Here are some correctly classified and misclassified examples. I noticed that some classes could easily be mixed up with others (e.g.T-shirt, pullover, coat, and shirt) while some other classes had more variance in their shapes (e.g. bag) and therefore harder to classify.

Category	Misclassified #1	Midclassified #2
T-shirt/top	Classfied as coat	Classfied as shirt
Trouser	Classfied as dress	Classfied as dress
Pullover	Classfied as shirt	Classfied as coat
Dress	Classfied as shirt	Classfied as coat
Coat	Classfied as shirt	Classfied as coat
Sandal	Classfied as sneaker	Classfied as ankle boots
Shirt	Classfied as coat	Classfied as t-shirt/top
Sneaker	Classfied as ankle boots	Classfied as bag
Bag	Classfied as t-shirt/top	Classfied as shirt
Ankle boot	Classfied as sandal	Classfied as sneaker

Learned filters

Here are the learned filters in the first convolutional layers.

Part II. Semantic Segmantation

In this part, we aim to classify each pixel of images in the Facade Dataset to one of the five categories. I used the first 707 images in the training set as training images and set side the last 200 images in the training set as evaluation data set.

Architecture

After hyperparameter tuning, I settled on the following architecture with 6 layers and alternating between maxpooling and upsampling in layer 2-5. After experimenting with different number channels, I noticed that increasing the number of channels in a certain layer alone by a factor of 2-4 will not significantly increase the average AP than the current result. With filter sizes, I used larger 5-by-5 filters in the begining to have a larger area vision and used 3-by-3 filters in the following layers to reduce computational costs.

Layer 1: 3@256*256 -> 64@256*256

nn.Conv2d(3, 64, 5, padding=2)
nn.ReLU(inplace=True)

Layer 2: 64@256*256 -> 128@256*256 -> 128@128*128

nn.Conv2d(64, 128, 5, padding=2)
nn.ReLU(inplace=True)
nn.MaxPool2d(2)

Layer 3: 128@128*128 -> 128@128*128 -> 128@256*256

nn.Conv2d(128, 128, 5, padding=2)
nn.ConvTranspose2d(128, 128, 2, stride=2, padding=0)
nn.ReLU(inplace=True)

Layer 4: 128@256*256 -> 64@256*256 -> 64@128*128

nn.Conv2d(128, 64, 3, padding=1)
nn.ReLU(inplace=True)
nn.MaxPool2d(2)

Layer 5: 64@128*128 -> 32@128*128 -> 32@256*256

nn.Conv2d(64, 32, 3, padding=1)
nn.ConvTranspose2d(32, 32, 2, stride=2, padding=0)
nn.ReLU(inplace=True)

Layer 6: 32@256*256 -> 5@256*256

nn.Conv2d(32, 5, 3, padding=1)

Training loss

I used Adam optimizer with default parameters (learning rate = 0.001, weight decay = 0.00001) and trained 30 epochs on minibatches of 10.

Accurary

Here is the average precision of five classes. Notice that the classifier did a decent job on distinguish facade, balcony and others while did poorly on distinguishing pillars.

Category	AP
Others	68.22%
Facade	78.43%
Pillar	15.12%
Window	83.08%
Balcony	44.78%
Average	57.92%

Examples

Here are some labeled images where the first one was not part of the original test set.

Original Image	Labeled Image