CS194-26 Proj4: Classification and Segmentation

Part I: Image Classification

In this part, we will solve classification of images in the Fashion-MNIST dataset. Here are some samples from the dataset.

Network and Training Details

The network has 2 convolutional layers. Both layers have 32 channels with kernel size 5 and 3 respectively. Each conv layer is followed by a ReLU and a maxpooling layer with kernel size 2. Then the flatten output is passed to 2 fully connected layers of size 128 and 10.

The network is trained using Adam with batch size 128, learning rate 5e-3 and weight decay 1e-4 for 15 epochs. 10% of training data is held out as validation data.

Analysis on Network Performance

Here is the classification accuracy for training and validation datasets during the training process.
Here are the confusion matrices for the validation and test dataset. The diagonal entries are the per class accuracies. 
According to the validation results, the hardest class is class 0: T-shirt/top (0.79)
According to the test results, the hardest class is class 6: shirt (0.73)
Each row has 4 samples from a class. The left 2 columns show samples which the network classifies correctly, and the right 2 columns show samples which it classifies incorrectly.
Visualize the learned filters in the first conv layer.

Part II: Semantic Segmentation

In this part, we will solve semantic segmentation of images in mini Facade dataset. 

Network and Training Details

The model has 4 conv layers with 32, 64, 128, 256 channels respectively. Each conv layer has kernel size 3, stride 2, and padding 1. Then there are 2 transposed conv layer with 128 and 64 channels respectively. Each layer has kernel size 6, stride 4, and padding 1. Each conv/transposed conv layer is followed by a batch normalization layer and a ReLU layer. At the end, there is a conv layer with 5 channels.

The network is trained using Adam with batch size 32, learning rate 1e-3 and weight decay 1e-4 for 15 epochs. 10% of training data is held out as validation data. 

Analysis on Network Performance

Here is the training and validation loss across the training process.
Here is the average precision on the test set.
Here is one segmentation result of running the trained model on a photo of a building from my collection. The model get right on balcony and window but fails on pillar and others.