Project 4: Classification and Segmentation

Jonathan Tan, cs-194-abl

Part 1: Image Classification

Dataloader

Several sampled images from the FashionMNIST dataset.

In left to right order: pullover, pullover, pullover, sandal

Results

Below is the training/validation accuracy of my model every 4000 minibatches. It seemed like it converged relatively fast, so I only trained over two epochs.

I got pretty okay accuracies with my model. It seems that coats and shirts are the hardest for my model to tell apart, probably because it's extermely straightforward to conflate them with something else.

class accuracy
top 81%
trouser 97%
pullover 81%
dress 91%
coat 71%
sandal 94%
shirt 67%
sneaker 97%
bag 97%
ankle boot 94%
network accuracy 87%

Now below are examples of correctly classified images and incorrectly classified images. The incorrect examples are labeled with what my neural net classified the entries as.

correct incorrect

I also visualized the learned filters from the first convolutional layer. It's not the easiest thing to make out, but you can kinda sorta tell how they're used to filter clothing. Lots of activations for typical shapes you'd expect in an article of clothing, for example. Edges of stuff, curves of shoes, etc.

Part 2: Semantic Segmentation

Model Architecture

Where self.n_class = 5, the neural network I used is as follows.

            nn.Conv2d(3, 64, 3, padding=2),
            nn.ReLU(inplace=True),

            nn.Conv2d(64, 64, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),

            nn.ConvTranspose2d(64, 64, 2, stride=2, padding=0),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.ReLU(inplace=True),

            nn.Conv2d(128, 128, 3, padding=1),
            nn.MaxPool2d(2),

            nn.ConvTranspose2d(128, 128, 2, stride=2, padding=0),
            nn.Conv2d(128, self.n_class, 3, padding=0),

Five convlutional layers, each one except the last two followed by ReLU's, and the second and third with pooling operations. Upscaling was done wherever necessary to ensure dimensional matching. 64, 64, 128, 128, and 5 channels respectively.

Hyperparameters: default Adam optimizer params, with learning rate=1e-3, weight_decay=1e-5

Average Precision

Per class and average precision on the test set.

AP = 0.6623760691442281
AP = 0.7473245670520275
AP = 0.119582528128497
AP = 0.7268927323006635
AP = 0.2614720091246093
Average AP = 0.5035295811500051

Collection Output

The model does well on facades and windows, but kinda sucks everywhere else: balconies, pillars, others. Not too surprising given the difficulty of training and what appears to be an overall imbalance in the representation of the classes.

input output