CS 194-26

Project 4: Classification and Segmentation

Zhimin Cai

Part 1: Image Classification

The training loss goes down as epoch increases, but the validation loss starts overfitting at epoch = 10

In this part, we want to train a network to classify each pixel on an image of Mini Facade dataset to its correct class.


Class    Color    Pixel_value

others     black            0

facade     blue             1

pillar       green            2

window  orange          3

balcony    red              4


As in part 1, we want to split train data (total 905 images) to train and validate set at random (750, 155). After training the network, we can test it on 114 test images.


Network Achitecture: 6 convolution layers (32, 128, 512, 128, 32, 5) with kernel 5*5 and padding with ReLu. In this way, we can output a same size image as label.


Loss and perameters: Adam optimizer, learning rate of 1e-3, weight decay of 1e-5

batch size: 10

epoch: 50



In this part, we use FashionMNIST date set to train a two layer network for classify an image to one of ten classes.


Classes: "T-shirt/Top", "Trouser", "Pullover", "Dress", "Coat",  "Sandal",  "Shirt",  "Sneaker", "Bag", "Ankle Boot".


Firstly, we need to split the train images (60000) into train and validation sets (55000, 5000).


Then, we build a two converlution layers model.


Architecture:

Conv1:  32 output chanels, 3*3 kernel, with padding

ReLu + Maxpool 2*2 kernel, stride 2

Conv2:  32 output chanels, 3*3 kernel, with padding

ReLU + Maxpool 2*2 kernel, stride 2

flatten the weights and 2 Fully Connect and ReLu: output 10*1 vector as result


Loss Function and Optimizer:

Adam Optimizer, 0.001 learning rate

batch size: 50

epoch: 10





torchvision FashionMNIST date set

Part 2: Semantic Segmentation

Visualizing the learned CONV1 filters

Accuracy of each class

Train and Validation accuracy during the training process

As we can observe from the plot, the training accuracy goes up as we train more epoches on the model, which is normal since the model gets more and more knowledge about the train set data. The validation accuracy gets up and down over time, which means the model overfits on training dataset, so the model learns something that is not true on validation set.

Class                Test Accuracy (%)         Validate Accuracy (%)

T-shirt/Top                     91.70                              91.74  

Trouser                          97.80                              97.18  

Pullover                         86.60                              89.08  

Dress                            95.10                              94.92  

Coat                               81.10                              83.33  

Sandal                           98.50                              100.00 

Shirt                               65.60                              58.33  

Sneaker                         97.00                              95.52  

Bag                                98.50                              98.25  

Ankle Boot                    95.60                              93.80

We can observe that class shirt has the lowest Test Accuracy and Validate Accuracy, which means it's hardest to identify. At the mean time, class pullover and coat also have relatively low accuracy since they can be incorrectly identified as shirt or its similar kind.

As shown in following plot, shirt can be incorrectly recognized as T-shirt with high probability. Coat can be incorrectly recognized as Pullover with high probability.


Accuracy of the network on the 55000 train images: 93 %

Accuracy of the network on the 5000 validation images: 91 %

Accuracy of the network on the 10000 test images: 91%

From above classfication, we can observe boots can be incorrectly recognized as sneakers with high probability. other shoes also can be incorrectly recognized as sandal.

The Average Precision of each class:


other:     AP = 0.664358063951615

facade:   AP = 0.788303643462531

pillar:      AP = 0.14984528126526037

window: AP = 0.8126611465659187

balcony: AP = 0.4201849552579555


Overall AP = 0.57

According to the above APs, we can tell the model is good at distinguish facade and window with high probability, and does decently ok for other. It is relatively bad at recognizing pillar and balcony. My guess for this to happen is because the training set does not has enough pillar and balcony for it to learn, but facade and window are in every image.

Results on test set

Results on test set

The result label recognize the window in the right front correctly, but it incorrectly identify the middle far windows as balcony and most of the background as facade. I think it's because this photo contains not only the building but also other staffs that confuse the model.



City Hall, Salt Lake City Most windows are identified correctly except some extra noises. It also correctly detect the balcony on the right middle part.

From this project, I get to know more about Neural Network and how to train it to accomplish classification or segmentation with handy experiments. It's also interesting to apply the model on my own images and recognize what gets right or wrong.