CS194-26 Assignment 4

Classification and Segmentation

Classification Segmentation

Part 1: Classification

Training and Validation

The suggested CNN architecture was used, LR and weight decay of 0.001, and in the end of 20 epochs was able to perform with ~88-90% accuracy on the test set. Below, the training and validation loss are described. The following accuracies are listed per row per class. Distinguishing between shirts and tshirts was the hardest, with tshirts having a 71% correctness, which is rather expected, being the most similar articles of clothing in the dataset.
Layer 1 filters also exhibit signs of oriented gradients and gaussian filters, matching the results expected nearing optimal behavior.

picture

Training and Validation Loss Plots

Subtle overfit can be seen around 10-20 epochs, and then stabilizes.
picture

Filters for layer 1, concatenated horizontally. Filters 6 and 7 are characteristic oriented gradients, while a few filters are not used / zero-ed out.

picture

Ankle Boots: 89% Accuracy.

Matched 1
picture

Matched 2
picture

Misclassified as Bag
picture

Misclassified as Sandal
picture

Bag: 97% Accuracy.

Matched 1
picture

Matched 2
picture

Misclassified as Coat
picture

Misclassified as Dress
picture

Coat: 82% Accuracy.

Matched 1
picture

Matched 2
picture

Misclassified as Bag
picture

Misclassified as Dress
picture

Dress: 89% Accuracy.

Matched 1
picture

Matched 2
picture

Misclassified as Bag
picture

Misclassified as Coat
picture

Pullover: 83% Accuracy.

Matched 1
picture

Matched 2
picture

Misclassified as Bag
picture

Misclassified as Coat
picture

Sandal: 96% Accuracy.

Matched 1
picture

Matched 2
picture

Misclassified as Ankle Boot
picture

Misclassified as Bag
picture

Shirt: 73% Accuracy.

Matched 1
picture

Matched 2
picture

Misclassified as Bag
picture

Misclassified as Coat
picture

Sneaker: 97% Accuracy.

Matched 1
picture

Matched 2
picture

Misclassified as Ankle Boot
picture

Misclassified as Bag
picture

Trouser: 97% Accuracy.

Matched 1
picture

Matched 2
picture

Misclassified as Bag
picture

Misclassified as Coat
picture

Tshirt: 71% Accuracy.

Matched 1
picture

Matched 2
picture

Misclassified as Bag
picture

Misclassified as Coat

Segmentation

CNN Architecture

Multiple architectures were attempted, many of which initially failing to drop loss below 1.0, and other designs which had exploding gradients. 6 Layers: 3x3 Conv, 32 channels, max pool, 3x3 Conv, 64 channels, Batchnorm to control gradients, 3x3 Conv, 128 channels, 3x3 ConvTranspose, 64 channels, stride 2, 3x3 Conv, 32 channels, 3x3 ConvTranspose, 32 channels, stride 2, 3x3 Conv, 5 channels, all padded to maintain size, ReLU between all layers except the final output. Initially, padding was not used, and UpSampling layers were used to compensate size changes, but these likely caused huge instabilities in gradient descent, and were thus abandoned. Having multiple BatchNorms was also tested, but did not produce any huge deviations in results. Learning rates below 1e-5 are required, though weight decay can vary between 1e-5 to 1e-6 and still produce similar results. Average test set AP was 0.578 after 100 epochs using an 800 image training set.
The facade output does reasonably well for facades, windows, and some balconies, but does very poorly with pillars, and does not have enough well established boundaries to discern the "other" category with great fidelity.

picture

Validation and Training losses over time. AP = 0.578

picture

Example Facade. Stockphoto

picture

Example facade output.