Several sampled images from the FashionMNIST dataset.
In left to right order: pullover, pullover, pullover, sandal
Below is the training/validation accuracy of my model every 4000 minibatches. It seemed like it converged relatively fast, so I only trained over two epochs.
I got pretty okay accuracies with my model. It seems that coats and shirts are the hardest for my model to tell apart, probably because it's extermely straightforward to conflate them with something else.
class | accuracy |
---|---|
top | 81% |
trouser | 97% |
pullover | 81% |
dress | 91% |
coat | 71% |
sandal | 94% |
shirt | 67% |
sneaker | 97% |
bag | 97% |
ankle boot | 94% |
network accuracy | 87% |
Now below are examples of correctly classified images and incorrectly classified images. The incorrect examples are labeled with what my neural net classified the entries as.
correct | incorrect |
---|---|
I also visualized the learned filters from the first convolutional layer. It's not the easiest thing to make out, but you can kinda sorta tell how they're used to filter clothing. Lots of activations for typical shapes you'd expect in an article of clothing, for example. Edges of stuff, curves of shoes, etc.
Where self.n_class = 5
, the neural network I used is as follows.
nn.Conv2d(3, 64, 3, padding=2),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, 3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2),
nn.ConvTranspose2d(64, 64, 2, stride=2, padding=0),
nn.Conv2d(64, 128, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, 3, padding=1),
nn.MaxPool2d(2),
nn.ConvTranspose2d(128, 128, 2, stride=2, padding=0),
nn.Conv2d(128, self.n_class, 3, padding=0),
Five convlutional layers, each one except the last two followed by ReLU's, and the second and third with pooling operations. Upscaling was done wherever necessary to ensure dimensional matching. 64, 64, 128, 128, and 5 channels respectively.
Hyperparameters: default Adam optimizer params, with learning rate=1e-3, weight_decay=1e-5
Per class and average precision on the test set.
AP = 0.6623760691442281
AP = 0.7473245670520275
AP = 0.119582528128497
AP = 0.7268927323006635
AP = 0.2614720091246093
Average AP = 0.5035295811500051
The model does well on facades and windows, but kinda sucks everywhere else: balconies, pillars, others. Not too surprising given the difficulty of training and what appears to be an overall imbalance in the representation of the classes.
input | output |
---|---|