Classification and Segmentation

Hanlin Chen

Part 1: Image Classification

For this part, we trained a convolutional neural network (CNN) to group images from the Fashion MNIST dataset into 10 appropriate classes. The classes are numbered as follows:

0: T-shirt/top
1: Trouser
2: Pullover
3: Dress
4: Coat
5: Sandal
6: Shirt
7: Sneaker
8: Bag
9: Ankle boot

Here are some examples of images from the dataset, as well as their correct class labels:

6: Shirt

5: Sandal

4: Coat

5: Sandal

3: Dress

7: Sneaker

1: Trouser

0: T-shirt/top

6: Shirt

The CNN we used had the following structure:

Convolutional Layer: 32 channels, 3x3 convolution
Leaky ReLU Layer
Max Pool Layer: 2x2
Convolutional Layer: 32 channels, 3x3 convolution
Leaky ReLU Layer
Max Pool Layer: 2x2
Fully Connected Layer: 800 input features and 120 output features
Leaky ReLU Layer
Fully Connected Layer: 120 input features and 10 output features
Cross Entropy Loss

I used ADAM as my optimizer, with learning rate 0.0102 and weight decay .00001. During training, here are the training and validation accuracies achieved:

Now applying the CNN to the test set, here are the results of the test set, broken down between classes and averaged across all.
Test Accuracy of 0: T-shirt/top : 80 %
Test Accuracy of 1: Trouser : 92 %
Test Accuracy of 2: Pullover : 76 %
Test Accuracy of 3: Dress : 89 %
Test Accuracy of 4: Coat : 67 %
Test Accuracy of 5: Sandal : 93 %
Test Accuracy of 6: Shirt : 28 %
Test Accuracy of 7: Sneaker : 79 %
Test Accuracy of 8: Bag : 95 %
Test Accuracy of 9: Ankle boot : 98 %
Test Accuracy of the network on the test images: 80 %

If we display the filters as images, this is what each channel of the first convolutional layer looks like:

Part 2: Semantic Segmentation

In this part, we will train a CNN to perform semantic segmentation on the Mini Facade dataset. For the CNN, we used the following architecture:

Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
Leaky ReLU Layer
Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
Leaky ReLU Layer
Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
Leaky ReLU Layer
Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
Leaky ReLU Layer
Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
Leaky ReLU Layer
Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
Leaky ReLU Layer
Convolutional Layer: 5 channels, 3x3 convolution, 1 pad
Cross Entropy Loss

I used a learning rate of 0.001 and weight decay of .00000495. Here is the AP for each class, as well as the average:
AP = 0.6265281410084671
AP = 0.7267151774343149
AP = 0.07530673915746357
AP = 0.6475259197942225
AP = 0.13129110628309693
Average: 0.441473416735513
Here are some examples of test set results:

input	output	ground truth
input	output	ground truth
input	output	ground truth