For this project, I will first solve a image classification problem for the Fashion-MNIST dataset in part 1, and then solve a semantic segmentation problem for the Mini Facade dataset in part 2.
Data: The data we used is the fashion-MNIST dataset, which is already included in Pytorch. Each image is of dimension 28x28x3.
CNN: We used the following CNN:
Hyperparameters:
We have also kept track of our model's training and validation accuracy throughout the training
It seems like among all classes, the "shirt" class is the hardest class to classify correctly, while the bag class is the easiest to classify correctly. (image here)
Class | Correctly Classified | Incorrectly Classified |
tshirt | ||
trouser | ||
pullover | ||
dress | ||
coat | ||
sandal | ||
shirt | ||
sneaker | ||
bag | ||
ankle boot |
We are visualizing some of the filters that our model learned.
For this semantic segmentation task, I used the following CNN:
I did a 70-20-10 split for the training, validation and test set. As I will show below, although this vanilla CNN model is extremly simple, to y surprise it actually achieves acceptable result.
We also kept track of the loss throughout the process of training. After 40 epochs, the validation loss begins to fluctuate and does not go down anymore, indicating that it might start to overfit. So I train the model for only 50 epochs.
Here is how our model perform with each category
Class | AP Score |
others | 0.6400833712306877 |
facade | 0.7638179312442582 |
pillar | 0.11012632195617077 |
window | 0.7898640813673297 |
balcony | 0.32036415495583426 |
Average AP | 0.5248511721508562 |
Here are some sample test images, including an image of my own.
Consistent with the AP of each classes, the model does the best when classifying parts of pixels that is part of a window. It did poorly on pillars,
|
|