Project 4: Classification and Segmentation

Project 4: Classification and Segmentation

Introduction

In part 1, we train a simple classifier for the Fashion-MNIST dataset. In part 2, we train a neural net to perform semantic segmentation of images in a mini Facade dataset. For this project, I ran code on Google Colab using a GPU.

Working with neural nets in pytorch

Creating and training the neural network for each respective part followed the same essential steps:

Retrieve or create dataset(s), then initialize a loader object for it
Split the data into training and validation sets
Define the network architecture
Train the network on the training set, evaluate it on the validation set, and tune hyperparameters if necessary. Repeat.
Test the network with the test set. Report and analyze results.

Part 1: Classification

I read through this COLAB tutorial, which essentially goes through most of the steps we wanted to, but with a different datset.

These articles “Neural Networks” and “Training a Classifier” from pytorch (linked in the spec) were also very useful in figuring out how the code works!

Hyperparameters

optimizer = optim.SGD with momentum=0.9
learning rate: 0.01
training: 35000 samples
validation: 7000 samples
epochs: 12

Net

Net architecture was as specified in the spec.
Some things that noticeably improved the classifier were 1) increasing filter size from a naive 3x3 and 2) increasing the maxpool filter size. I suspect it’s because this means each number in the next layer represents more values of the previous.

Results

Loss:
My training loss converged to 0.15, and validation loss to ~0.27.

Plot the train and validation accuracy during the training process.

Compute a per class accuracy of your classifier on the validation and test dataset.

Computing accuracy...
Accuracy of the network on the 60000 train images: 92.20 %
Accuracy of the network on the 7000 validation images: 90.50 %
Accuracy of the network on the 10000 test images: 90.23 %

Class      Accuracy (%)
0            82.50   
1            98.50   
2            88.00   
3            90.20   
4            79.90   
5            98.30   
6            73.30   
7            95.60   
8            98.70   
9            97.30

Which classes are the hardest to get?

Class 6 performed very poorly in comparison to the rest of the classes. Otherwise, classes 0 and 4 also did not do well.

Show 2 images from each class which the network classifies correctly, and 2 more images where it classifies incorrectly.

Correct	Incorrect

Visualize the learned filters.

Part 2: Segmentation

Hyperparameters

Epoch: 25
Training dataset: 600 images, batch size 32
Evaluation dataset: 306 images, batch size 16
Loss: CrossEntropyLoss() (as starter)
Optimizer: Adam (as starter)

Net

5 conv layers with ReLU applied after
Maxpool after ReLU (layers 2, 3, 4)
Maximum number of channels outputted by Conv2d: 96
Min filter size: 3
Max filter size: 7

Results

My network converged around epoch 20, with losses of about 0.79 for both training and validation.

Display a plot showing both training and validation loss across iterations.

Report the average precision on the test set.

Testing on test set: 0.8173239387963948

AP = 0.6452316617590995
AP = 0.771376737037513
AP = 0.12939426037550147
AP = 0.8111214941012976
AP = 0.4557280429990795
Average AP: 0.5625704392544982

Try running the trained model on the photo of a building from your collection. Which parts of the images does it get right? Which ones does it fail on?

Input	Output

My model does pretty well with ‘windows’ (orange) and ‘others’ (black), but failed with ‘pillars’ (green) here, and didn’t get most of ‘balcony’ (red).