Classification and Segmentation

Introduction

In this project, I solve the classification of images in the Fashion-MNIST dataset and semantic segmentation of images in mini Facade dataset using Deep Neural Networks! So without further ado, let's get right into it :)

Part 1: Image Classification

This part of the project deals with image classification, a highly researched and popular deep learning task. There are various techniques on how to accomplish such a task, but before diving into all that we need to chose a dataset!

The Data

I used the Fasion MNIST dataset to train my model. It contains 10 classes and 60,000 training/validation images and 10,000 test images.

Model Architecture

For the classification task I created and tested two different model archictures. The first one is as follows:

The second model included a few dimensionality changes and the inclusion of a batch normalization layer:

Training Results

After defining the model archictures, I then proceeded to train the models using different optimizers. The results of the training for the two model archictures are displayed in the following plots.

Model 1: Adam (learning rate = 0.01 | batch size = 4 | epochs = 10):

Model 1: Adam (learning rate = 0.001 | batch size = 4 | epochs = 10):

Model 1: SGD (learning rate = 0.001 | momentum = 0.9 | batch size = 4 | epochs = 10):

Model 2: SGD (learning rate = 0.001 | momentum = 0.9 | batch size = 4 | epochs = 10):

Based on these results, the best performance was found by model 2 using the SGD optimizer with learning rate = 0.001, momentum = 0.9, batch size = 4, and epochs = 10. The test accuracy was 90.6% as reported above (the per class accuracy can also be seen directly above). It seems like the toughest class to classify was the shirt class, as its accuracy is seen to be signinfacntly lower than the other classes'.

Now, here are some examples of correctly and incorrectly classified images:

Ankle Boot
correct
incorrect: Sandal | Sneaker
Bag
correct
incorrect: T-shirt/top | Coat
Coat
correct
incorrect: Shirt | Pullover
Dress
correct
incorrect: Shirt | Coat
Pullover
correct
incorrect: Coat | Shirt
Sandal
correct
incorrect: Sneaker | Sneaker
Shirt
correct
incorrect: T-shirt/top | Pullover
Sneaker
correct
incorrect: Ankle Boot | Ankle Boot
T-shirt/top
correct
incorrect: Shirt | Shirt
Trouser
correct
incorrect: Dress | Coat

Lastly, below is a visualization of the learned convolutional filters:

Part 2: Semantic Segmentation

Now, I tackle the problem of semantic segmentation. This refers to the labeling of each pixel in an image to its correct object class.

The Data

I used the Mini Facade dataset which was included as a part of the starter code for this project. Below is a sample from this dataset, which includes images of buildings, each of which contains 5 potential object classes.

Model Architecture

For the classification task I created and tested a model using the following archicture:

Training Results

After defining the model archicture, I then proceeded to train the models using several different optimizers and hyperparameters. I found the best results when using the Adam optimizatin algorithm with learning_rate = 1e-3, weight_decay=1e-5, batch_size = 64, and epochs = 25. This yielded an average AP = 0.55.

Lastly, I ran my trained model on the photo of the apartment building across the street from me, and below is the resulting image.

As is shown above, my model does a pretty good job with classifying the facades and windows correctly. However, it seems to have misclassified an area of sunlight as a pillar and thinks that some darker parts of the windows are balconies.