CS294-26 Project 4

Fuyi Yang

Overview

In the first part of the project we are going to classifty Fashion_MNIST dataset into ten classes. Firstly, dataset is loaded and converted to tensors. Severl sample images are displayed. Then we build a CNN consisting of 2 convolutional layer, 2 fully connected linear layer and one output layer. After each conv layer and linear layer, a ReLU followed by a maxpool is added. The cross entropy loss and Adam optimizer are used for prediction.

The second part of the project aims to achieve semantic segmentation of Mini Facade Dataset which is able to label each pixel in the image to its correct object class. There are five classes encoded by different colors. A CNN made of 5 convolutional layer and 1 tansposed convolutional layer is used for training. Cross entropy loss and Adam optimizer are used here to achieve an Average Preciseion (AP) over 0.45.


Part1

The batch_size is 32 and 20% of the training dataset are chosen to be the validation dataset.

Sample images and their labels:
 
Train and validation accuracy during training process:
 

Accuracy per class on the validation and test dataset:

Validation: Test:
 
T-shirt/top 87% 86%
 
Trouser 95% 97%
 
Pullover 64% 69%
 
Dress 88% 86%
 
Coat 87% 85%
 
Sandal 96% 96%
 
Shirt 38% 39%
 
Sneaker 96% 96%
 
Bag 94% 95%
 
Ankle boot 91% 88%
 

Correct and incorrect classifications for each class:

Correct: Incorrect:
 
T-shirt/top
 
Trouser
 
Pullover
 
Dress
 
Coat
 
Sandal
 
Shirt
 
Sneaker
 
Bag
 
Ankle boot
 

Learned filters (32):

 

Part2

Architecture of the CNN and hyperparameters chosen fot the traiing:

Five convolutional layers with two maxpool layers and one transposed convolutional layer for upsampling are used for training:

Conv2d(3,128,3,1,1) -> ReLU -> Conv2d(128,256,3,1,1) -> ReLU -> Maxpool(2,2) -> Conv2d(256,128,3,1,1) -> ReLU -> Conv2d(128,128,3,1,1) -> ReLU -> Maxpool(2,2) -> ConvTranspose2d(128,64,6,4,1) -> Conv2d(64,5,3,1,1)

20% of the training dataset are chosen to be the validation set. Cross entropy loss is used as prediction loss. Adam optmiizer with learning rate 1e-3 and weight decay 1e-5 is used for training.


Train and validation loss during training process:
 
Classes: AP:
 
Others 0.6619
 
Facade 0.7851
 
Pillar 0.1336
 
Window 0.8087
 
Balcony 0.3766
 
Average 0.5532
 

Examples from test set:

Input Image Ground truth Predicted label
 
 
 

Flinders Street Railway Station from my collection:

Input Image Predicted label
 

As it can been seen from the examples above, Window(orange) and Facade(blue) are well predicted while it is difficult to identify Pillars(green). Furthermore, due to the fact that training dateset images are always within the boundaries of the building, it will fail to predict the surroundings outside the building like the Railway Station case above.