Project 4: Classification and Segmentation

Thomas Low

Overview

This project explores machine learning applications in computer vision: image classification and semantic segmentation.

Part 1: Image Classification

This part trains an image classification network on the Fashion MNIST dataset. This dataset contains 10 classes that we will attempt to classify: t-shirt, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag, ankle boot.

Dataloader

sample images

Architecture

My network consists of 2 convolutional layers (16 and 32 channels) with size 3 kernel followed by a ReLU and a maxpool.

Part 1 Architecture

Hyperparameters

Hyperparameter Value
Loss Function Cross Entropy Loss
Optimizer Adam
Learning Rate 0.01
Epochs 2

Results

Train and Validation Accuracies

Accuracies Graph

Class Accuracies

Class Accuracies

T-shirts, shirts, and pullovers are the most difficult to classify because they are look similar and are difficult to distinguish.

results

Learned Filters

filters

Part 2: Semantic Segmentation

This part trains a semantic segmentation labeling network on the Mini Facade dataset. This dataset contains buildings that we will attempt to label by their balconies, windows, pillars, facade, and other.

Architecture

My network consists of 6 convolutional layers (32, 64, 128, 64, 32, and 5 channels) with size 3 kernel followed each by a Batch Normalization and ReLU except for the last layer. Maxpooling and Upsampling is done in the 2nd and 5th layer.

Part 1 Architecture

Hyperparameters

Hyperparameter Value
Loss Function Cross Entropy Loss
Optimizer Adam
Learning Rate 1e-3
Weight Decay 1e-5
Epochs 20
Batch Size 10

Results

Train and Validation Losses

Loss Graph

Average Precision on Test Set

AP

Examples

The model is able to correctly label windows most of the time, but is unable to differentiate between the facade and other objects.

21 21 21
46 46 46
65 65 65
88 88 88

Conclusion

Although this was conceptually difficult due to a limited background in machine learning, it was rewarding to train an image classification and semantic segmentation network.