Classification and Segmentation

A CS 194-26 project 4 by Xiangtian Li cs194-26-agr

Part1: Image Classification

Implementation and Training Details

We use a network that has two convolution layer, the first conv layer has 12 channels and the second conv layer has 24 channels. Each of them has kernel size of 3 and is follow the ReLU nonlinearity and a max-pooling layer of size 2 and stride 2.

At the very end we have two fully-connected layers of size 84 and 10 respectively. We use the cross-entropy loss and the network is trained with Adam with a learning rate of 0.001. 1/6 of the training set is used as validation set and the rest is used for training.

Show some Sample Images

raw_example

Training Curve

Train_loss

We plot the training accuracy and validation accuracy during training. As training progresses, the network seems to overfit to the training set since the accuracy on the validation set no longer increases.

Accuracy of Different Classes

Test

As we can see in the figure, our model performs well in 5 classes, the accuracies of which are more than 90%. However, its performance on shirt is no so good, which is 76.2%.

Some Successful and Failure Samples

Example

Visualize Filters

Here I visualize two conv layers.

conv1

conv2

Besides, I also visualize two feature maps of the two layers

ch1

ch2

Part2: Semantic Segmentation

Implementation and Training Details

For the network structure, we use UNet to complete segmentation. The structure is below.

unet

Besides, I use Adam and SGD as two optimizers and compare their performance with the learning rate=0.0001. And the weight decay is 0.00001 and momentum is 0.9. The training epoch is 50.

Training Curve

Adam

Adam_loss

SGD

SGD_loss

Result

Adam

Adam_result

SGD

SGD_result

 

As we can see, the average AP of Adam model is 0.70 and that of SGD model is 0.62. Thus, the Adam model performs better than SGD model.

My Data Sample

Below is some buildings from San Francisco. We can see that it got fight on walls and windows but performs bad on pillars.

Image Segmentation
x0 y0
x9 y9
x12 y12