Classification and Segmentation
Hanlin Chen
Part 1: Image Classification
For this part, we trained a convolutional neural network (CNN) to group images from the Fashion MNIST dataset into 10 appropriate classes. The classes are numbered as follows:
- 0: T-shirt/top
- 1: Trouser
- 2: Pullover
- 3: Dress
- 4: Coat
- 5: Sandal
- 6: Shirt
- 7: Sneaker
- 8: Bag
- 9: Ankle boot
Here are some examples of images from the dataset, as well as their correct class labels:
6: Shirt
|
5: Sandal
|
4: Coat
|
5: Sandal
|
3: Dress
|
7: Sneaker
|
1: Trouser
|
0: T-shirt/top
|
6: Shirt
|
6: Shirt
|
The CNN we used had the following structure:
- Convolutional Layer: 32 channels, 3x3 convolution
- Leaky ReLU Layer
- Max Pool Layer: 2x2
- Convolutional Layer: 32 channels, 3x3 convolution
- Leaky ReLU Layer
- Max Pool Layer: 2x2
- Fully Connected Layer: 800 input features and 120 output features
- Leaky ReLU Layer
- Fully Connected Layer: 120 input features and 10 output features
- Cross Entropy Loss
I used ADAM as my optimizer, with learning rate 0.0102 and weight decay .00001. During training, here are the training and validation accuracies achieved:
Now applying the CNN to the test set, here are the results of the test set, broken down between classes and averaged across all.
Test Accuracy of 0: T-shirt/top : 80 %
Test Accuracy of 1: Trouser : 92 %
Test Accuracy of 2: Pullover : 76 %
Test Accuracy of 3: Dress : 89 %
Test Accuracy of 4: Coat : 67 %
Test Accuracy of 5: Sandal : 93 %
Test Accuracy of 6: Shirt : 28 %
Test Accuracy of 7: Sneaker : 79 %
Test Accuracy of 8: Bag : 95 %
Test Accuracy of 9: Ankle boot : 98 %
Test Accuracy of the network on the test images: 80 %
If we display the filters as images, this is what each channel of the first convolutional layer looks like:
Part 2: Semantic Segmentation
In this part, we will train a CNN to perform semantic segmentation on the Mini Facade dataset. For the CNN, we used the following architecture:
- Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
- Leaky ReLU Layer
- Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
- Leaky ReLU Layer
- Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
- Leaky ReLU Layer
- Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
- Leaky ReLU Layer
- Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
- Leaky ReLU Layer
- Convolutional Layer: 32 channels, 3x3 convolution, 1 pad
- Leaky ReLU Layer
- Convolutional Layer: 5 channels, 3x3 convolution, 1 pad
- Cross Entropy Loss
I used a learning rate of 0.001 and weight decay of .00000495. Here is the AP for each class, as well as the average:
AP = 0.6265281410084671
AP = 0.7267151774343149
AP = 0.07530673915746357
AP = 0.6475259197942225
AP = 0.13129110628309693
Average: 0.441473416735513
Here are some examples of test set results:
input
|
output
|
ground truth
|
input
|
output
|
ground truth
|
input
|
output
|
ground truth
|