Project 4: Classification and Segmentation

1. Image Classification

We trained a CNN on the Fashion MNIST dataset. The number of samples in the training set is 54000. The number of samples in the validation set is 6000. The number of samples in the test set is 10000.

1.1 Sample a few images from the dataset

Here are a few samples from the dataset along with their class labels.


Coat	Sneaker	Shirt	Pullover

1.2 CNN Architecture


The CNN Architecture

The number of epochs was 80. The batch size was 32. We used cross entropy loss and Adam optimizer for training. The learning rate was set at 0.05, and the weight decay was set at 0.0

1.3 Training and Validation Accuracies

Here is the plot of the train and validation accuracies during the training process.


Accuracies during Training

1.3 Per-class Accuracy

1.3.1 Per-class Accuracy of Validation Set

Here is the per-class accuracy from the validation set

Class	Accuracy
Top	89 %
Trouser	97 %
Pullover	90 %
Dress	89 %
Coat	89 %
Sandal	97 %
Shirt	72 %
Sneaker	94 %
Bag	98 %
Ankle boot	96 %
Average	91%

From the table, Shirt is the hardest to get. The top four hardest to get classes are Shirt, Top, Coat, and Dress.

1.3.2 Per-class Accuracy of Test Set

Here is the per-class accuracy from the test set.

Class	Accuracy
Top	86 %
Trouser	98 %
Pullover	87 %
Dress	91 %
Coat	86 %
Sandal	97 %
Shirt	75 %
Sneaker	97 %
Bag	97 %
Ankle boot	96 %
Average	91%

From the table, Shirt is the hardest to get. The top four hardest to get classes are Shirt, Top, Coat, and Pullover.

1.3.3 Samples from Correct Predictions

Here are two samples the network classified correctly from each class

Image	Class Label
	Ankle boot
	Ankle boot
	Bag
	Bag
	Coat
	Coat
	Dress
	Dress
	Pullover
	Pullover
	Sandal
	Sandal
	Shirt
	Shirt
	Sneaker
	Sneaker
	Top
	Top
	Trouser
	Trouser

1.3.4 Samples from Incorrect Predictions

Here are two samples the network classified incorrectly from each class

Image	Ground Truth Label	Prediction
	Ankle boot	Sandal
	Ankle boot	Sneaker
	Bag	Pullover
	Bag	Shirt
	Coat	Pullover
	Coat	Shirt
	Dress	Shirt
	Dress	Trouser
	Pullover	Shirt
	Pullover	Shirt
	Sandal	Sneaker
	Sandal	Sneaker
	Shirt	Coat
	Shirt	Dress
	Sneaker	Ankle boot
	Sneaker	Sandal
	Top	Shirt
	Top	Shirt
	Trouser	Dress
	Trouser	Pullover

1.4 Visualization of the Learned Filters

Here is the visualization of the learned filters from the first layer. The values are normalized to the range of 0 to 1.


Learned Filters

2. Semantic Segmentation

We trained a CNN on the Mini Facade dataset. The number of samples in the training set is 815. The number of samples in the validation set is 91. The number of samples in the test set is 114.

2.1 CNN Architecture


The CNN Architecture

The number of epochs was 30. The batch size was 10. We used cross entropy loss and Adam optimizer for training. The learning rate was set at 0.0025, and the weight decay was set at 0.0001

2.2 Training and Validation Losses


Losses during Training

2.3 Average Precision on the Test Set

The average precision(AP) on the test set was 0.569

2.4 Sample Predictions

Here are a few sample predictions from the Mini Facade dataset.

Image	Ground Truth	Prediction

2.5 My Own Image

This is the result from running the trained model on the photo of a building from my own collection.

Image	Prediction

In general, the model did pretty good on parts of window and facade. It got it right for the windows in the image and the facades of the building are mostly correctly labeled. If failed on parts of pillar: as we can see, there are no pillars in the image yet the model labeled some parts as pillar in the prediction. It also failed on parts of balcony: the right side of the image are all balconies, but the model predicted them as window.