Classification and Segmentation

Part 1: Image Classification

Dataloader:

I used the pytorch Dataloader to use the Fashion MNIST dataset. I sampled 4 images along with their corresponding class (the dataset has 10 classes that correspondent to clothing categories):

Image

CNN:

For my Convolution Neural Network, I had 2 conv layers and 2 fully connected layer. The first conv layer calls conv2d() with 1 input channel and 32 output channels. It then calls nn.BatchNorm2d(32), performs a ReLU followed by a MaxPool. The second conv layer does the same except the input channel is also 32. The first fully connected layer calls nn.Linear(512, 200) and the second calls nn.Linear(200, 10).

Loss Function and Optimizer:

I trained my neural network using Adam using a learning rate of 0.0005 and got an accuracy of 90%.

Results:

Training accuracy:

Image

Validation accuracy:

Image

Per class accuracy:

Image

The hardest classes to get are thus Shirt, Pullover and T-shirt/Top because they are very similar categories.

Class 0: T-shirt/Top

Correct classification

Image

Correct classification

Image

Classified as Shirt

Image

Classified as Shirt

Image

Class 1: Trouser

Correct classification

Image

Correct classification

Image

Classified as Dress

Image

Classified as Dress

Image

Class 2: Pullover

Correct classification

Image

Correct classification

Image

Classified as Shirt

Image

Classified as Coat

Image

Class 3: Dress

Correct classification

Image

Correct classification

Image

Classified as Coat

Image

Classified as Shirt

Image

Class 4: Coat

Correct classification

Image

Correct classification

Image

Classified as Dress

Image

Classified as Shirt

Image

Class 5: Sandal

Correct classification

Image

Correct classification

Image

Classified as Sneaker

Image

Classified as Sneaker

Image

Class 6: Shirt

Correct classification

Image

Correct classification

Image

Classified as Coat

Image

Classified as T-shirt/Top

Image

Class 7: Sneaker

Correct classification

Image

Correct classification

Image

Classified as Sandal

Image

Classified as Sandal

Image

Class 8: Bag

Correct classification

Image

Correct classification

Image

Classified as Sandal

Image

Classified as Sneaker

Image

Class 9: Ankle Boot

Correct classification

Image

Correct classification

Image

Classified as Sandal

Image

Classified as Sneaker

Image

Learned Filters:

Image

Part 2: Semantic Segmentation

CNN:

For my Convolution Neural Network, I used 5 conv layers with 64, 128, 128, 512 and 5 channels each. Right before the last layer, I used MaxPool2D() and ConvTranspose2d().

Loss Function and Optimizer:

I used learning rate of 0.0005 and a weight decay of 1e-5. This gave an AP of 0.54.

Training Loss:

Image

Validation Loss:

Image

Running trained model on my own image:

Image
Image