Image Classification and Segmentation

CS 194: Computational Photography, Spring 2020

Project 4

Abby Cohn


In this project, the first part consists of image classification in the Fashion-MNIST dataset using Pytorch in Google Colab. The second part is a semantic segmentation in the Facades Dataset. Increasing average precision is achieved by changing parameters, such as epoch number and convolutional layer characteristics.

Image Classification

The model I created has 2 convolutional layers, 3x3 kernels, and has Relu and maxpool layers. I couple the cross entropy function for loss with the Adam optimizer that has a learning rate of 0.001. These parameters helped the model get to an overall 91% accuracy. The validation accuracy was slightly below the training accuracy over the 5 epochs.

accuracy of validation(blue) & training(orange) over #epochs
overall accuracy: 91%

As we can see, the accuracy of the shirt was the worst out of the 10 classes at 74%. My guess as to why its a bit lower is that there is much more variation in a shirt than other clothing items, and it can be easily mistaken for other pieces. The best accuracy is trousers and bags, which both hit 99% accuracy, probably because of their very distinct shapes.

2 correctly predicted t-shirts; incorrectly predicted shirt & dress
2 correctly predicted trousers; incorrectly predicted dress & dress
2 correctly predicted pullovers; incorrectly predicted shirt & shirt
2 correctly predicted dresses; incorrectly predicted shirt & shirt
2 correctly predicted coats; incorrectly predicted shirt & shirt
2 correctly predicted sandals; incorrectly predicted ankle boot & sneaker
2 shirts; incorrectly predicted pullover & t-shirt
2 correctly predicted sneakers; incorrectly predicted ankle boot & ankle boot
2 correctly predicted bags; incorrectly predicted sneaker & shirt
2 correctly predicted ankle boots; incorrectly predicted sneaker & shirt
Here, we have the 32 3x3 filters of the first convolution layer.

Semantic Segmentation

For this part of the projects, I played around with parameters until I found the most optimal combination. I finished with a network of 5 convolution layers with 64, 128, 256, 512, and 5 channels. I attempted to add in maxpool with a transpose, but it did not affect AP very much, so I left it out. I use the cross entropy loss function and the Adam optimizer with learning rate le-3 and weight decay of 2e-5. I ended up using 15 epochs.

trianing(orange) & validation(blue) loss over #epochs
average of .49 AP across classes

I was able to achieve an overall AP (average precision of .49. I tested the model on a couple images of my choosing. I used an image of Berkeley’s South Hall and an image of a government building in Vietnam. In both, we see that windows and facades are classified pretty well, and it makes out balconies somewhat. It does not perform very well at classifying pillars, but the model did alright overall.