I used the pytorch Dataloader to use the Fashion MNIST dataset. I sampled 4 images along with their corresponding class (the dataset has 10 classes that correspondent to clothing categories):
For my Convolution Neural Network, I had 2 conv layers and 2 fully connected layer. The first conv layer calls conv2d() with 1 input channel and 32 output channels. It then calls nn.BatchNorm2d(32), performs a ReLU followed by a MaxPool. The second conv layer does the same except the input channel is also 32. The first fully connected layer calls nn.Linear(512, 200) and the second calls nn.Linear(200, 10).
I trained my neural network using Adam using a learning rate of 0.0005 and got an accuracy of 90%.
Training accuracy:
Validation accuracy:
Per class accuracy:
The hardest classes to get are thus Shirt, Pullover and T-shirt/Top because they are very similar categories.
Class 0: T-shirt/Top
Correct classification
Correct classification
Classified as Shirt
Classified as Shirt
Class 1: Trouser
Correct classification
Correct classification
Classified as Dress
Classified as Dress
Class 2: Pullover
Correct classification
Correct classification
Classified as Shirt
Classified as Coat
Class 3: Dress
Correct classification
Correct classification
Classified as Coat
Classified as Shirt
Class 4: Coat
Correct classification
Correct classification
Classified as Dress
Classified as Shirt
Class 5: Sandal
Correct classification
Correct classification
Classified as Sneaker
Classified as Sneaker
Class 6: Shirt
Correct classification
Correct classification
Classified as Coat
Classified as T-shirt/Top
Class 7: Sneaker
Correct classification
Correct classification
Classified as Sandal
Classified as Sandal
Class 8: Bag
Correct classification
Correct classification
Classified as Sandal
Classified as Sneaker
Class 9: Ankle Boot
Correct classification
Correct classification
Classified as Sandal
Classified as Sneaker
Learned Filters:
For my Convolution Neural Network, I used 5 conv layers with 64, 128, 128, 512 and 5 channels each. Right before the last layer, I used MaxPool2D() and ConvTranspose2d().
I used learning rate of 0.0005 and a weight decay of 1e-5. This gave an AP of 0.54.
Training Loss:
Validation Loss:
Running trained model on my own image: