Project 4 - Lindsay Yang

Project overview

The first part of the project was training a neural net to classify a set of images into 10 diffferent categories. This net architecture used convolutional layers followed by fully connected layers. The second part of the project was training a neural net to label parts of a building image into different classes (ie windows, pillars, etc). This net architecture used many convolutional layers (with nonlinearities).

Part 1: Image Classification

CNN architecture

The model consisted of 2 convolutional layers and 2 fully connected layers. Each convolutional layer consisted of a 48 channel 2d convolution with a 3x3 filter, followed by a softmax, then maxpool. Then, the output after the 2 convolutional layers goes through a fully connected layer, a ReLu, and then another fully connected layer. The final output is a 10-vector, and the class with highest activation is the predicted label.

Loss function and optimizer and training

I used torch.optim.Adam as the optimizer, with a learning rate of 0.01 and weight decay of 0.00001. I used nn.CrossEntropyLoss as the loss function. The net was trained for 5 epochs.

Part 1 Results

Accuracies

The train and validation accuracies during the training process:

Accuracy of the network on the validation images: 85%

Accuracy of the network on the test images: 85%

Class accuracies

For both the validation and test set, "Bag" had the highest accuracy and "Shirt" had the lowest.

Class

Validation accuracy

Test accuracy

T-shirt/top

0.7993

0.7820

Trouser

0.9275

0.9110

Pullover

0.7794

0.7700

Dress

0.8969

0.8620

Coat

0.8370

0.8200

Sandals

0.9725

0.9680

Shirt

0.6076

0.5530

Sneaker

0.9351

0.9400

Bag

0.9762

0.9690

Ankle boot

0.9460

0.9510

Validation set examples:

Classified correctly

Classified incorrectly

Test set examples:

Classified correctly

Classified incorrectly

Visualized filters

These are the learned 3x3 filters of the first convolutional layer (before nonlinearities were applied).

Part 2: Semantic Segmentation

I split the provided training data into a training set and validation set (100 images) prior to loading the data.

CNN architecture

The model consists of 6 convolutional layers, as follows:

1. input --> Conv2d (in: 5 channels, out: 32 channels), filter size=3x3, padding=1 --> MaxPool2d, filter size=2x2 --> Upsample, scale=2 -->
2. --> Conv2d (in: 32 channels, out: 64 channels), filter size=3x3, padding=1 --> MaxPool2d, filter size=2x2 --> Upsample, scale=2 -->
3. --> Conv2d (in: 64 channels, out: 128 channels), filter size=3x3, padding=1 --> MaxPool2d, filter size=2x2 --> Upsample, scale=2 -->
4. --> Conv2d (in: 128 channels, out: 64 channels), filter size=3x3, padding=1 --> MaxPool2d, filter size=2x2 --> Upsample, scale=2 -->
5. --> Conv2d (in: 64 channels, out: 32 channels), filter size=3x3, padding=1 --> MaxPool2d, filter size=2x2 --> Upsample, scale=2 -->
6. --> Conv2d (in: 32 channels, out: 5 channels), filter size=3x3, padding=1 --> MaxPool2d, filter size=2x2 --> Upsample, scale=2 --> output

Loss function, optimizer, and training hyperparameters

The input data was processed using batch size = 1. I used optim.Adam as my optimizer with a learning rate = 0.001 and weight decay = 0.00001, nn.CrossEntropyLoss as my loss function, and trained the network for 25 epochs.

Part 2 Results

Loss graph

The training loss and validation loss has been plotted over 25 epochs.

Average Precision

Average precision values fall between 0 and 1. These are the AP scores for the classes [others, facade, pillar, window, balcony], corresponding to [black, blue, green, orange, red]:

AP = 0.6560492458662615
AP = 0.7634421164751314
AP = 0.13545104091410726
AP = 0.7752812762598212
AP = 0.39690699127001716

Average AP (approx.) = 0.544

Sample result

input image

ground truth labelling

output labelling

The trained network classifies windows and facades (orange and blue) relatively well, and it doesn't do so hot on pillars (green)

Project 4 - Classification and Segmentation

Project overview

Part 1: Image Classification

CNN architecture

Loss function and optimizer and training

Part 1 Results

Accuracies

Class accuracies

Validation set examples:

Test set examples:

Visualized filters

Part 2: Semantic Segmentation

CNN architecture

Loss function, optimizer, and training hyperparameters

Part 2 Results

Loss graph

Average Precision

Sample result