Classification and Segmentation

Classification

Model Architecture

Loss: CrossEntropyLoss

Optimizer: SGD(lr=0.01, weight_decay=1e-4, momentum=0.9

Train accuracy

Validation accuracy

Per class accuracy

'T-shirt/top'	'Trouser'	'Pullover'	'Dress'	'Coat'	'Sandal'	'Shirt'	'Sneaker'	'Bag'	'Ankle boot'	Total
83.3	98.5	89.1	92.4	82.1	98.1	73.4	96.0	97.0	95.9	90.58

Right examples

'T-shirt/top'

'Trouser'

'Pullover'

'Dress'

'Coat'

'Sandal'

'Shirt'

'Sneaker'

'Bag'

'Ankle boot'

Wrong examples

'T-shirt/top'

'Trouser' & 'Pullover'

'Trouser'

'Pullover' & 'Dress'

'Pullover'

'Dress' & 'Coat'

'Dress'

'Coat' & 'Shirt'

'Coat'

'Pullover' & 'Shirt'

'Sandal'

'Sneaker' & 'Ankle boot'

'Shirt'

'T-shirt/top' & 'Pullover'

'Sneaker'

'Sandal' & 'Ankle boot'

'Bag'

'T-shirt/top' & 'Shirt'

'Ankle boot'

'Sandal' & 'Sneaker'

Learned filters

32 filters in the first convolution layer

32*32 filters in the second convolution layer

Segmentation

Model Architecture

Each block in the image includes convolution, BatchNorm and ReLU. And the "Down" block also have a maxpooling layer to half the image size. The "Up" block first upsample the output of the last block and concatenates it with the output of another block, which has the same size.

Loss across iterations

Average precision

'others'	'facade'	'pillar'	'window'	'balcony'	mean
61.08	71.68	21.74	83.99	52.14	58.12

Result on my picture

Input

Result

The model does good on windows, but it cannot segment the pillars very well. I think the reason is the data of pillars is small, which is not enough for the model to learn the feature of pillars. Besides, the model has diffculty when segment 'facade' and 'others'. The reason may be that there is no stable feature of 'others' class.