Project 4: Classification and Segmentation

Thomas Low

Overview

This project explores machine learning applications in computer vision: image classification and semantic segmentation.

Part 1: Image Classification

This part trains an image classification network on the Fashion MNIST dataset. This dataset contains 10 classes that we will attempt to classify: t-shirt, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag, ankle boot.

Dataloader

Architecture

My network consists of 2 convolutional layers (16 and 32 channels) with size 3 kernel followed by a ReLU and a maxpool.

Hyperparameters

Hyperparameter	Value
Loss Function	Cross Entropy Loss
Optimizer	Adam
Learning Rate	0.01
Epochs	2

Results

Train and Validation Accuracies

Class Accuracies

T-shirts, shirts, and pullovers are the most difficult to classify because they are look similar and are difficult to distinguish.

Learned Filters

Part 2: Semantic Segmentation

This part trains a semantic segmentation labeling network on the Mini Facade dataset. This dataset contains buildings that we will attempt to label by their balconies, windows, pillars, facade, and other.

Architecture

My network consists of 6 convolutional layers (32, 64, 128, 64, 32, and 5 channels) with size 3 kernel followed each by a Batch Normalization and ReLU except for the last layer. Maxpooling and Upsampling is done in the 2nd and 5th layer.

Hyperparameters

Hyperparameter	Value
Loss Function	Cross Entropy Loss
Optimizer	Adam
Learning Rate	1e-3
Weight Decay	1e-5
Epochs	20
Batch Size	10

Results

Train and Validation Losses

Average Precision on Test Set

Examples

The model is able to correctly label windows most of the time, but is unable to differentiate between the facade and other objects.

Conclusion

Although this was conceptually difficult due to a limited background in machine learning, it was rewarding to train an image classification and semantic segmentation network.