Classification and Segmentation

Hanlin Chen

Part 1: Image Classification

For this part, we trained a convolutional neural network (CNN) to group images from the Fashion MNIST dataset into 10 appropriate classes. The classes are numbered as follows:

Here are some examples of images from the dataset, as well as their correct class labels:

6: Shirt
5: Sandal
4: Coat
5: Sandal
3: Dress
7: Sneaker
1: Trouser
0: T-shirt/top
6: Shirt
6: Shirt

The CNN we used had the following structure:

I used ADAM as my optimizer, with learning rate 0.0102 and weight decay .00001. During training, here are the training and validation accuracies achieved:

Now applying the CNN to the test set, here are the results of the test set, broken down between classes and averaged across all.
Test Accuracy of 0: T-shirt/top : 80 %
Test Accuracy of 1: Trouser : 92 %
Test Accuracy of 2: Pullover : 76 %
Test Accuracy of 3: Dress : 89 %
Test Accuracy of 4: Coat : 67 %
Test Accuracy of 5: Sandal : 93 %
Test Accuracy of 6: Shirt : 28 %
Test Accuracy of 7: Sneaker : 79 %
Test Accuracy of 8: Bag : 95 %
Test Accuracy of 9: Ankle boot : 98 %
Test Accuracy of the network on the test images: 80 %

If we display the filters as images, this is what each channel of the first convolutional layer looks like:

Part 2: Semantic Segmentation

In this part, we will train a CNN to perform semantic segmentation on the Mini Facade dataset. For the CNN, we used the following architecture:

I used a learning rate of 0.001 and weight decay of .00000495. Here is the AP for each class, as well as the average:
AP = 0.6265281410084671
AP = 0.7267151774343149
AP = 0.07530673915746357
AP = 0.6475259197942225
AP = 0.13129110628309693
Average: 0.441473416735513
Here are some examples of test set results:

input
output
ground truth
input
output
ground truth
input
output
ground truth