In this part, we train a model to classify images in the FashionMNIST dataset into 10 classes. Here are some sample images and their respective classes:
|
I trained a convolutional neural network consisting 2 conv layers, each followed by a maxpool (2 x 2) and a ReLU, and two fully connected linear layers. Printing out the net object gives the following structure:
|
As shown above, the first conv layer takes in 1 channel and outputs 32 channels, and the second conv layer takes in those 32 channels and spits out 64 channels. Both layers use a filter size of 5 x 5. For the fully connected layers, the first one takes in 64 x 4 x 4 input channels and outputs 1024 channels and the last layer outputs the final results from 10 classes.
I used cross entropy loss as the prediction loss and an Adam optimizer with a learning rate of 0.001.
For this problem, I used a training size of 50000 and a validation set of size 10000. The validation set was used for cross validation to tune hyperparameters such as step size. Here the respective accuracies are plotted against the number of epoches:
|
|
After training the CNN, I tested the network against the test set and got the following results:
|
In addition, here are the per class accuracies of the validation and test sets. We can see that the shirt class is the hardest to get, and the coat class is the second hardest:
For each class, two images classified correctly and two misclassified are shown below:
|
Before training the model, I split the trainset into training and validation sets, using the first 700 samples for the training set and the remaining 206 samples for validation.
Below is the structure of the neural net:
As shown above, there are 6 convolution layers, each with filters of varying size (from 3x3 to 7x7) and followed by a ReLU (except the final layer). The number of channels are 3->32->64->128->64->32->5. All paddings and strides of the conv layers are padding = 2 and stride = 1.
In addtion, I tuned the following hyperparameters using the validation set: number of epoches = 20, batch size = 4. For the Adam optimizer, I ended up using the suggested parameters (a learning rate of 1e-3 and weight decay 1e-5).
The loss across iterations is plotted below:
|
|
|
|
|
|
|
|
In this picture, we can see that the model gets the windows and facades right the most and fails on balconys. In other test sample, it also fails on pillars. This is consistent with the difference in AP for each class.