CS 194-26
Project 4: Classification and Segmentation
Zhimin Cai
Part 1: Image Classification
The training loss goes down as epoch increases, but the validation loss starts overfitting at epoch = 10
In this part, we want to train a network to classify each pixel on an image of Mini Facade dataset to its correct class.
Class Color Pixel_value
others black 0
facade blue 1
pillar green 2
window orange 3
balcony red 4
As in part 1, we want to split train data (total 905 images) to train and validate set at random (750, 155). After training the network, we can test it on 114 test images.
Network Achitecture: 6 convolution layers (32, 128, 512, 128, 32, 5) with kernel 5*5 and padding with ReLu. In this way, we can output a same size image as label.
Loss and perameters: Adam optimizer, learning rate of 1e-3, weight decay of 1e-5
batch size: 10
epoch: 50
In this part, we use FashionMNIST date set to train a two layer network for classify an image to one of ten classes.
Classes: "T-shirt/Top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle Boot".
Firstly, we need to split the train images (60000) into train and validation sets (55000, 5000).
Then, we build a two converlution layers model.
Architecture:
Conv1: 32 output chanels, 3*3 kernel, with padding
ReLu + Maxpool 2*2 kernel, stride 2
Conv2: 32 output chanels, 3*3 kernel, with padding
ReLU + Maxpool 2*2 kernel, stride 2
flatten the weights and 2 Fully Connect and ReLu: output 10*1 vector as result
Loss Function and Optimizer:
Adam Optimizer, 0.001 learning rate
batch size: 50
epoch: 10
torchvision FashionMNIST date set
Part 2: Semantic Segmentation
Visualizing the learned CONV1 filters
Accuracy of each class
Train and Validation accuracy during the training process
As we can observe from the plot, the training accuracy goes up as we train more epoches on the model, which is normal since the model gets more and more knowledge about the train set data. The validation accuracy gets up and down over time, which means the model overfits on training dataset, so the model learns something that is not true on validation set.
Class Test Accuracy (%) Validate Accuracy (%)
T-shirt/Top 91.70 91.74
Trouser 97.80 97.18
Pullover 86.60 89.08
Dress 95.10 94.92
Coat 81.10 83.33
Sandal 98.50 100.00
Shirt 65.60 58.33
Sneaker 97.00 95.52
Bag 98.50 98.25
Ankle Boot 95.60 93.80
We can observe that class shirt has the lowest Test Accuracy and Validate Accuracy, which means it's hardest to identify. At the mean time, class pullover and coat also have relatively low accuracy since they can be incorrectly identified as shirt or its similar kind.
As shown in following plot, shirt can be incorrectly recognized as T-shirt with high probability. Coat can be incorrectly recognized as Pullover with high probability.
Accuracy of the network on the 55000 train images: 93 %
Accuracy of the network on the 5000 validation images: 91 %
Accuracy of the network on the 10000 test images: 91%
From above classfication, we can observe boots can be incorrectly recognized as sneakers with high probability. other shoes also can be incorrectly recognized as sandal.
The Average Precision of each class:
other: AP = 0.664358063951615
facade: AP = 0.788303643462531
pillar: AP = 0.14984528126526037
window: AP = 0.8126611465659187
balcony: AP = 0.4201849552579555
Overall AP = 0.57
According to the above APs, we can tell the model is good at distinguish facade and window with high probability, and does decently ok for other. It is relatively bad at recognizing pillar and balcony. My guess for this to happen is because the training set does not has enough pillar and balcony for it to learn, but facade and window are in every image.
Results on test set
Results on test set
The result label recognize the window in the right front correctly, but it incorrectly identify the middle far windows as balcony and most of the background as facade. I think it's because this photo contains not only the building but also other staffs that confuse the model.
City Hall, Salt Lake City Most windows are identified correctly except some extra noises. It also correctly detect the balcony on the right middle part.
From this project, I get to know more about Neural Network and how to train it to accomplish classification or segmentation with handy experiments. It's also interesting to apply the model on my own images and recognize what gets right or wrong.