Classification and Segmentation

Part 1: Image Classification

Dataloader:

I used the pytorch Dataloader to use the Fashion MNIST dataset. I sampled 4 images along with their corresponding class (the dataset has 10 classes that correspondent to clothing categories):

CNN:

For my Convolution Neural Network, I had 2 conv layers and 2 fully connected layer. The first conv layer calls conv2d() with 1 input channel and 32 output channels. It then calls nn.BatchNorm2d(32), performs a ReLU followed by a MaxPool. The second conv layer does the same except the input channel is also 32. The first fully connected layer calls nn.Linear(512, 200) and the second calls nn.Linear(200, 10).

Loss Function and Optimizer:

I trained my neural network using Adam using a learning rate of 0.0005 and got an accuracy of 90%.

Results:

Training accuracy:

Validation accuracy:

Per class accuracy:

The hardest classes to get are thus Shirt, Pullover and T-shirt/Top because they are very similar categories.

Class 0: T-shirt/Top

Correct classification

Correct classification

Classified as Shirt

Classified as Shirt

Class 1: Trouser

Correct classification

Correct classification

Classified as Dress

Classified as Dress

Class 2: Pullover

Correct classification

Correct classification

Classified as Shirt

Classified as Coat

Class 3: Dress

Correct classification

Correct classification

Classified as Coat

Classified as Shirt

Class 4: Coat

Correct classification

Correct classification

Classified as Dress

Classified as Shirt

Class 5: Sandal

Correct classification

Correct classification

Classified as Sneaker

Classified as Sneaker

Class 6: Shirt

Correct classification

Correct classification

Classified as Coat

Classified as T-shirt/Top

Class 7: Sneaker

Correct classification

Correct classification

Classified as Sandal

Classified as Sandal

Class 8: Bag

Correct classification

Correct classification

Classified as Sandal

Classified as Sneaker

Class 9: Ankle Boot

Correct classification

Correct classification

Classified as Sandal

Classified as Sneaker

Learned Filters:

Part 2: Semantic Segmentation

CNN:

For my Convolution Neural Network, I used 5 conv layers with 64, 128, 128, 512 and 5 channels each. Right before the last layer, I used MaxPool2D() and ConvTranspose2d().

Loss Function and Optimizer:

I used learning rate of 0.0005 and a weight decay of 1e-5. This gave an AP of 0.54.

Training Loss:

Validation Loss:

Running trained model on my own image: