In [24]:
from IPython.core.display import HTML
HTML("""
<style>

div.cell { /* Tunes the space between cells */
margin-top:1em;
margin-bottom:1em;
}

div.text_cell_render h1 { /* Main titles bigger, centered */
font-size: 2.2em;
line-height:0.9em;
}

div.text_cell_render h2 { /*  Parts names nearer from text */
margin-bottom: -0.4em;
}


div.text_cell_render { /* Customize text cells */
font-family: 'Georgia';
font-size:1.2em;
line-height:1.4em;
padding-left:3em;
padding-right:3em;
}

.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
}

</style>

<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.

""")

#Trebuchet MS
Out[24]:
The raw code for this IPython notebook is by default hidden for easier reading. To toggle on/off the raw code, click here.
In [6]:
import numpy as np
import matplotlib.pyplot as plt
import skimage.io as io
import skimage as sk

%matplotlib inline

Project 4: Classification and Segmentation

Part 1: Image Classification

In this part, we trained a convolutional neural network model to classify the images in FashionMNIST dataset.

1.1 The architecture of the CNN:

We use the recommended architecture as a start. The architecture of the neural network should is conv layers, 32 channels each with 33 filters, where each conv layer is followed by a ReLU followed by a 22 maxpool. This is followed by 2 fully connected networks. Apply ReLU after the first fc layer. We used Adam optimizer with learning rate 1e-3. The batch size is 100.

1.2 The learning curve

From the learning curve plotted below, we can see that the learning converges after the first 20 epoches. And the training is slowly overfitting.

In [12]:
lc = io.imread('learning_curve.png')
f=plt.figure(figsize=(15,15))
ax=f.add_subplot(1,1,1)
ax.imshow(lc)
plt.axis('off')
plt.show()

1.3 Correctly classfied images

We show correctly classified images for each category:

In [16]:
lc = io.imread('imgs1.png')
f=plt.figure(figsize=(15,15))
ax=f.add_subplot(1,1,1)
ax.imshow(lc)
plt.axis('off')
plt.show()

1.4 Incorrectly classfied images

We show incorrectly classified images for each category:

In [15]:
lc = io.imread('imgs2.png')
f=plt.figure(figsize=(15,15))
ax=f.add_subplot(1,1,1)
ax.imshow(lc)
plt.axis('off')
plt.show()

1.5 Accuracy

We show a per class accuracy of your classifier on the validation and test dataset, from which we can see it's hardest to get the shirt class.

Class Val acc. Test Acc.
TShirt 85% 83%
Trouser 98% 99%
Pullover 85% 86%
Dress 92% 91%
Coat 86% 85%
Sandal 97% 98%
Shirt 75% 72%
Sneaker 96% 96%
Bag 98% 98%
Ankle Boot 97% 97%
— — —
TOTAL 91% 90%

Visualize the learned filters

In [18]:
lc = io.imread('filter.png')
f=plt.figure(figsize=(15,15))
ax=f.add_subplot(1,1,1)
ax.imshow(lc)
plt.axis('off')
plt.show()

Part 2: Semantic Segmentation

We use the Mini Facade dataset to train a ConvNet for semantic segmentation. It refers to labeling each pixel in the image to its correct object class. Mini Facade dataset consists of images of different cities around the world and diverse architectural styles. It also contains semantic segmentation labels (in .png format) in 5 different classes: balcony, window, pillar, facade and others.

2.1 Architecture of the network

The network consists of 7 layers, with 64, 128, 256, 512, 512, 1024 and 5 channels respectively. Each is followed by a ReLu function. The filters' size are all 33 with zero padding. We add two 22 max pooling layers after the 5th and 6th conv layers. At the end, we upsample the output to match the size of the original image. We use Adam optimizer with 1e-3 learning rate and 1e-5 decay rate. We manually reduce the learning rate to 1e-4 after the first 65 epoches, since the training starts diverging because of too large learning rate. The batch size is 10.

2.2 Learning curve

We use 1e-3 learning rate at start. Then we manually reduce the learning rate to 1e-4 after the first 65 epoches, since the training starts diverging. That's why you can see that the validation loss suddenly drops at he 65th epoch.

In [20]:
lc = io.imread('lc.png')
f=plt.figure(figsize=(15,15))
ax=f.add_subplot(1,1,1)
ax.imshow(lc)
plt.axis('off')
plt.show()

2.3 Training accuracy

The AP value can achive 0.47 after the training, but we can see that Pillar and Bacony are hard to recognize.

Class AP
Others 0.65
Facade 0.63
Pillar 0.09
Window 0.59
Balcony 0.42

2.4 Result

We show an input image(left) and the segmentation(right), where we can see that it gets windows and some pillars right, but it doesn't get balcony right.

In [23]:
im1 = io.imread('input.png')
im2 = io.imread('output.png')
f=plt.figure(figsize=(15,15))
ax1=f.add_subplot(1,2,1)
ax1.imshow(im1)
plt.axis('off')
ax2=f.add_subplot(1,2,2)
ax2.imshow(im2)
plt.axis('off')
plt.show()
In [ ]: