Here's the graph of training loss vs validation loss. Hyperparameters: learning rate = 0.001, batch size = 32, epoch = 20.
I then tried different hyperparamters: batch size and learning rate.
We can clearly see that a slower learning rate makes the model converge slightly slower.
We cannot see that much of a different from the graph, but higher batch size trains faster as it uses more parallelism of numpy.