CS194-26: Image Manipulation and Computational Photography Spring 2020
Here I trained a simple convolutional neural network to recognize different items of clothing. I used the torchvision.datasets.FashionMNIST
dataset.
The human-readable class names are not included in pytorch, so I used the names given in the official GitHub. I also invert the images to display in the original grayscale.
Here are a few sample images and their classes:
I started off by using the recommended CNN architecture as given in the spec. However, I found that increasing the number of channels from 32 to 64 increased the validation accuracy by 1-2%. Also, ReLU ended up being the best non-linearity to use, keeping all else constant. Additionally, lowering the learning rate from 0.01 to 0.001 also boosted performance.
I trained for a total of 100 epochs. Here is the validation and training accuracies evaluated at the end of each epoch:
At the end of training, I ran my model on both the validation and the test sets each, and found the following per-class accuracies:
Class | Val acc. | Test Acc. |
---|---|---|
Shirt | 71.20% | 68.00% |
Pullover | 86.38% | 85.40% |
T-Shirt | 90.25% | 89.60% |
Coat | 89.36% | 88.70% |
Dress | 94.88% | 94.00% |
Sneaker | 95.33% | 96.40% |
Ankle boot | 97.46% | 97.30% |
Trouser | 98.21% | 97.70% |
Sandal | 97.71% | 97.70% |
Bag | 98.32% | 97.70% |
— | — | — |
TOTAL | 91.95% | 91.25% |
As we can see, the most challenging class is “shirt”, which achieves only 68% accuracy on the test data. After that is “pullover” which achieves only 85% accuracy.
Looking at the examples below, it is easy to see why. The two classes are quite visually similar, and so the CNN can often confuse the two.
Additionally, here are 2 examples per class for the test data that were correctly classified by my model:
Here are 2 examples per class for the test data that were incorrectly classified by my model:
Both the correct label and the incorrect classification are shown in the title.
First layer. Darker values are negative and lighter values are positive:
Second layer. Since the second layer has 64 channels, only the first 3 are shown as RGB:
This model uses 6 convolutional layers:
Layer | Input Channels | Output Channels | Kernel Size |
---|---|---|---|
1 | 3 | 16 | 17x17 |
2 | 16 | 32 | 17x17 |
3 | 32 | 64 | 17x17 |
4 | 64 | 128 | 17x17 |
5 | 128 | 128 | 17x17 |
6 | 128 | 5 | 17x17 |
To speed up training, I apply the “ResNet” technique described in lecture. That is, each time we apply a layer to \(x\), we re-add \(x\) back into the output:
\[x' = \text{layer}(\text{ReLU}(x), w) + x\]
(I do not add \(x\) to back to itself for the final layer. Instead I just apply ReLU again at the end. See the code on bCourses for implementation.)
For loss, I use the provided Adam, except with a learning rate of 3e-5
instead of the default 1e-3
. Weight decay is left at 1e-5
.
For learning, I use a minibatch size of \(1\) and train for \(2^4=16\) epochs.
Here is a graph showing the training and validation loss during training:
Here are the resulting per-class AP scores:
Class | AP Score |
---|---|
others |
0.5189 |
facade |
0.6278 |
pillar |
0.1057 |
window |
0.7563 |
balcony |
0.4417 |
— | — |
Average | 0.4901 |
Here, I run the algorithm on the familiar facade.jpg
image from project 2. I applied a perspective transform beforehand to align the image like in the test data. To visualize the resulting output, I took the argmax
of the output tensor and applied the provided colors for each class.
As we can see, the large balcony is correctly classified, and all the windows have some amount of recognition. The six smaller balconies are harder for the model to see, likely due to the blue “cloud” effect. The model also has difficulty handling the sharp shadows next to the windows, and ends up classifying those shadows as “other” near the top of the image. There is also a spruious pillar being classified near the bottom left.
For comparison, here are some results taken from the test set. Notice how the sharp shadows are absent in the photograph, which might help the model recognize windows.
HTML theme for pandoc
found here: https://gist.github.com/killercup/5917178