spring 2020
The neural network has the following structure.
The data used to train the model is the Fashion MNIST dataset, which has 10 classes, 60000 train/validation images and 10000 test images. 50000 images are used as train images and 10000 images are used as validation images. The batch size used is 100. Five example images are plotted below, with labels 8, 7, 5, 4, 8 respectively.
To train the neural network, 6-fold cross validation is used. The epoch number is 60 (10 times of cross validation). Using $n$ denote the epoch number, validation dataset has indice in $[n \% 6 * 100, (n \% 6 + 1) * 100 -1]$. The loss function is cross entropy, and the optimizer is Adam with learning rate 0.002. The training and validation accuracy is plotted below. The validation accuracy is almost saturated after 25 epoches.
The per class test accuracy is $[87.4, 98.3, 84.4, 88.6, 86.6, 98.8, 71.5, 96.9, 96.3, 96.5]$ for classes 0-9 respectively.
The per class validation accuray is $[99.9, 99.7, 97.9, 99.3, 98.9, 99.9, 96.8, 99.1, 99.3, 99.6]$ for classes 0-9 respectively.
The hardest class to predict is class 6. Even when the validation accuracy is almost perfect, the test accuracy is low for class 6. One reason might be overfitting. Two examples of correct predictions and two examples of incorrect predictions are shown below. By inspecting the incorrect predictions, it is clear that some of the images were mislabelled in the test set, i.e., class 5.
The learned kernels for the first convolution layer is also plotted below. Though it captures some characteristics of the original image, it is hard to visualize the meaning of each kernel.
The neural network has the following structure:
To train the network, the loss is cross entropy loss, and the optimizer is Adam with a learning rate of 1e-3 and weight decay 1e-5. The batch size used for train set and validation set is 6 (so there are 151 batches in total, 30 batches are used are validation set). The epoch number is 20. The plot of training and validation loss during training is shown below.
The average precision on the test set is:
Two sets of examples are shown below. Based on both examples, the windows in orange and the facades in blue are correctly predicted, whereas the pillars in green and the balcony in red are failures. The example images agree with the average precision calculated previously.