CS 194 Project 4: Classification and Segmentation


Jamarcus Liu, cs194-26-adh

Overview

In this project, we will use Convolutional Neural Network (CNN) to solve classication of images in the Fashion MNIST dataset and semantic segmentation of images in mini Facade dataset.

Part I. Image Classification

Data loading

In this part, we aim to classify images in the Fashion MNIST dataset to one of the ten categories. I used 50000 images for training, 30000 for validation, and another 30000 for testing. Here are some example pictures in the training set along with there labels

Shirt Ankle boot Shirt Coat Bag
Training and accuracy
I used Adam optimizer (learning rate = 0.01, weight decay = 0.0001) and ReLU as non-linearities. I trained with minibatches of 100, here is the accurary on training and testing data after each epoch. I trained only 5 epochs as the validation accuracy did not improve by much.
Category Validation Accurary Test Accuracy
T-shirt/top 83% 85%
Trouser 92% 90%
Pulllover 96% 94%
Dress 80% 79%
Coat 92% 92%
Sandal 81% 87%
Shirt 72% 71%
Sneaker 93% 94%
Bag 98% 93%
Ankle Boots 97% 92%
Average 88% 87%
Example filters
Here are some correctly classified and misclassified examples. I noticed that some classes could easily be mixed up with others (e.g.T-shirt, pullover, coat, and shirt) while some other classes had more variance in their shapes (e.g. bag) and therefore harder to classify.
Category Correct #1 Correct #2 Misclassified #1 Midclassified #2
T-shirt/top
Classfied as coat

Classfied as shirt
Trouser
Classfied as dress

Classfied as dress
Pullover
Classfied as shirt

Classfied as coat
Dress
Classfied as shirt

Classfied as coat
Coat
Classfied as shirt

Classfied as coat
Sandal
Classfied as sneaker

Classfied as ankle boots
Shirt
Classfied as coat

Classfied as t-shirt/top
Sneaker
Classfied as ankle boots

Classfied as bag
Bag
Classfied as t-shirt/top

Classfied as shirt
Ankle boot
Classfied as sandal

Classfied as sneaker
Learned filters

Here are the learned filters in the first convolutional layers.


Part II. Semantic Segmantation

In this part, we aim to classify each pixel of images in the Facade Dataset to one of the five categories. I used the first 707 images in the training set as training images and set side the last 200 images in the training set as evaluation data set.

Architecture

After hyperparameter tuning, I settled on the following architecture with 6 layers and alternating between maxpooling and upsampling in layer 2-5. After experimenting with different number channels, I noticed that increasing the number of channels in a certain layer alone by a factor of 2-4 will not significantly increase the average AP than the current result. With filter sizes, I used larger 5-by-5 filters in the begining to have a larger area vision and used 3-by-3 filters in the following layers to reduce computational costs.

Training loss
I used Adam optimizer with default parameters (learning rate = 0.001, weight decay = 0.00001) and trained 30 epochs on minibatches of 10.
Accurary

Here is the average precision of five classes. Notice that the classifier did a decent job on distinguish facade, balcony and others while did poorly on distinguishing pillars.

Category AP
Others 68.22%
Facade 78.43%
Pillar 15.12%
Window 83.08%
Balcony 44.78%
Average 57.92%
Examples

Here are some labeled images where the first one was not part of the original test set.

Original Image Labeled Image