from IPython.core.display import HTML
HTML("""
<style>

div.cell { /* Tunes the space between cells */
margin-top:1em;
margin-bottom:1em;
}

div.text_cell_render h1 { /* Main titles bigger, centered */
font-size: 2.2em;
line-height:0.9em;
}

div.text_cell_render h2 { /*  Parts names nearer from text */
margin-bottom: -0.4em;
}


div.text_cell_render { /* Customize text cells */
font-family: 'Georgia';
font-size:1.2em;
line-height:1.4em;
padding-left:3em;
padding-right:3em;
}

.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
}

</style>

<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.

""")

#Trebuchet MS

import numpy as np
import matplotlib.pyplot as plt
import skimage.io as io
import skimage as sk

%matplotlib inline

Project 4: Classification and Segmentation ¶

Part 1: Image Classification¶

In this part, we trained a convolutional neural network model to classify the images in FashionMNIST dataset.

1.1 The architecture of the CNN:¶

We use the recommended architecture as a start. The architecture of the neural network should is conv layers, 32 channels each with 33 filters, where each conv layer is followed by a ReLU followed by a 22 maxpool. This is followed by 2 fully connected networks. Apply ReLU after the first fc layer. We used Adam optimizer with learning rate 1e-3. The batch size is 100.

1.2 The learning curve¶

From the learning curve plotted below, we can see that the learning converges after the first 20 epoches. And the training is slowly overfitting.

lc = io.imread('learning_curve.png')
f=plt.figure(figsize=(15,15))
ax=f.add_subplot(1,1,1)
ax.imshow(lc)
plt.axis('off')
plt.show()

1.3 Correctly classfied images¶

We show correctly classified images for each category:

lc = io.imread('imgs1.png')
f=plt.figure(figsize=(15,15))
ax=f.add_subplot(1,1,1)
ax.imshow(lc)
plt.axis('off')
plt.show()

1.4 Incorrectly classfied images¶

We show incorrectly classified images for each category:

lc = io.imread('imgs2.png')
f=plt.figure(figsize=(15,15))
ax=f.add_subplot(1,1,1)
ax.imshow(lc)
plt.axis('off')
plt.show()

1.5 Accuracy¶

We show a per class accuracy of your classifier on the validation and test dataset, from which we can see it's hardest to get the shirt class.

Class	Val acc.	Test Acc.
TShirt	85%	83%
Trouser	98%	99%
Pullover	85%	86%
Dress	92%	91%
Coat	86%	85%
Sandal	97%	98%
Shirt	75%	72%
Sneaker	96%	96%
Bag	98%	98%
Ankle Boot	97%	97%
—	—	—
TOTAL	91%	90%

Visualize the learned filters¶

lc = io.imread('filter.png')
f=plt.figure(figsize=(15,15))
ax=f.add_subplot(1,1,1)
ax.imshow(lc)
plt.axis('off')
plt.show()

Part 2: Semantic Segmentation¶

We use the Mini Facade dataset to train a ConvNet for semantic segmentation. It refers to labeling each pixel in the image to its correct object class. Mini Facade dataset consists of images of different cities around the world and diverse architectural styles. It also contains semantic segmentation labels (in .png format) in 5 different classes: balcony, window, pillar, facade and others.

2.1 Architecture of the network¶

The network consists of 7 layers, with 64, 128, 256, 512, 512, 1024 and 5 channels respectively. Each is followed by a ReLu function. The filters' size are all 33 with zero padding. We add two 22 max pooling layers after the 5th and 6th conv layers. At the end, we upsample the output to match the size of the original image. We use Adam optimizer with 1e-3 learning rate and 1e-5 decay rate. We manually reduce the learning rate to 1e-4 after the first 65 epoches, since the training starts diverging because of too large learning rate. The batch size is 10.

2.2 Learning curve¶

We use 1e-3 learning rate at start. Then we manually reduce the learning rate to 1e-4 after the first 65 epoches, since the training starts diverging. That's why you can see that the validation loss suddenly drops at he 65th epoch.

lc = io.imread('lc.png')
f=plt.figure(figsize=(15,15))
ax=f.add_subplot(1,1,1)
ax.imshow(lc)
plt.axis('off')
plt.show()

2.3 Training accuracy¶

The AP value can achive 0.47 after the training, but we can see that Pillar and Bacony are hard to recognize.

Class	AP
Others	0.65
Facade	0.63
Pillar	0.09
Window	0.59
Balcony	0.42

2.4 Result¶

We show an input image(left) and the segmentation(right), where we can see that it gets windows and some pillars right, but it doesn't get balcony right.

im1 = io.imread('input.png')
im2 = io.imread('output.png')
f=plt.figure(figsize=(15,15))
ax1=f.add_subplot(1,2,1)
ax1.imshow(im1)
plt.axis('off')
ax2=f.add_subplot(1,2,2)
ax2.imshow(im2)
plt.axis('off')
plt.show()