In [45]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import skimage.io as io
import skimage as sk
from glob import glob
In [2]:
from IPython.core.display import HTML
HTML("""
<style>

div.cell { /* Tunes the space between cells */
margin-top:1em;
margin-bottom:1em;
}

div.text_cell_render h1 { /* Main titles bigger, centered */
font-size: 2.2em;
line-height:0.9em;
}

div.text_cell_render h2 { /*  Parts names nearer from text */
margin-bottom: -0.4em;
}


div.text_cell_render { /* Customize text cells */
font-family: 'Georgia';
font-size:1.2em;
line-height:1.4em;
padding-left:3em;
padding-right:3em;
}

.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
}

</style>

<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.

""")
Out[2]:
The raw code for this IPython notebook is by default hidden for easier reading. To toggle on/off the raw code, click here.

Project 4: Classification and Segmentation

Daniel Zhu, CS194-26-abh

Part 1: Image Classification

This part of the project was training a convolutional neural net to classify images of the FashionMNIST dataset. The dataset contains images of 10 different types of clothing: t-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot. I split the training dataset of 60,000 images with a 0.9/0.1 train-validation split. The test set contains 10,000 images.

Architecture

Conv2d(in: 1 channel, out: 32 channels): filter size=3x3 --> ReLU --> MaxPool2d: filter size=2x2
Conv2d(in: 32 channels, out: 32 channels): filter size=3x3 --> ReLU --> MaxPool2d: filter size=2x2
Linear(800 channels, 120 channels)--> ReLU
Linear(120 channels, 84 channels)--> ReLU
Linear(84 channels, 10 channels)

Hyperparameters

batch_size: 50
optimizer: optim.Adam, learning rate = 0.002
loss function: nn.CrossEntropyLoss
training epochs: 5

Results

Accuracy

The train and validation accuracies during the training process for 5 epochs (there are 10 batches per epoch).

In [52]:
im1=plt.imread("train_val_accuracy.jpg")
fig = plt.figure(figsize=(15,15))
ax1 = fig.add_subplot(1,1,1)
ax1.imshow(im1)
plt.axis("off");
Accuracy of the network on the 10000 test images: 90.23%

Class Accuracies

For both the validation and test set, "Trouser" had the highest accuracy and "Shirt" had the lowest. Shirt and T-shirt/top look very similar, so it was harder to classify correctly.

Class Validation Accuracy Test Accuracy
T-shirt/top 90.9 % 86.9%
Trouser 98.7% 97.7%
Pullover 83.4% 85.6%
Dress 93.8% 90.9%
Coat 84.7% 83.3%
Sandal 98.0% 95.7%
Shirt 77.6% 72.4%
Sneaker 97.4% 96.9%
Bag 98.5% 97.0%
Ankle boot 97.2% 95.9%

Classified Correctly:

In [34]:
imgs = np.array(glob("*.jpg"))
correct = imgs[["_correct_" in x for x in imgs]]
incorrect = imgs[["_incorrect_" in x for x in imgs]]

fig=plt.figure(figsize=(10, 10))
columns = 4
rows = 5
for i in range(0, columns*rows):
    img = plt.imread(correct[i])
    fig.add_subplot(rows, columns, i+1)
    plt.title(correct[i][0:correct[i].find("_")])
    plt.axis('off')
    plt.imshow(img, cmap='gray')
plt.show()

Classified Incorrectly (Actual Class:Predicted Class)

In [48]:
fig=plt.figure(figsize=(10, 10))
columns = 4
rows = 5
for i in range(0, columns*rows):
    img = plt.imread(incorrect[i])
    fig.add_subplot(rows, columns, i+1)
    temp = incorrect[i].split("_")
    if temp[0]=="T-shirt":
        plt.title(temp[0] + ":" + temp[2])
    else:
        plt.title(temp[0] + ":" + temp[1])
    plt.axis('off')
    plt.imshow(img, cmap='gray')
plt.show()
Visualized filters

These are the learned 3x3 filters of the first convolutional layer (before nonlinearities were applied).

In [51]:
im1=plt.imread("learned_filters.jpg")
fig = plt.figure(figsize=(15,15))
ax1 = fig.add_subplot(1,1,1)
ax1.imshow(im1)
plt.axis("off");

Part 2: Semantic Segmentation

This part of the project was training a convolutional neural net to "semantically separate" an image, labeling each pixel of the image to its correct class. The dataset contains images of 5 labels with its corresponding color: others (black), facade (blue), pillar (green), window (orange), balcony (red). I split the training dataset of 906 images with a 0.9/0.1 train-validation split. The test set contains 114 images.

Architecture

The architecture was based on this paper on deconvolution networks for semantic segmentation. The underly idea is you train a network as an "autoencoder", where you first compress the original image and then decompress it to relearn the labels/way the image is segmented.

Compression:
Conv2d(in: 3 channels, out: 32 channels): filter size=3x3, padding=1 --> MaxPool2d: filter size=2x2 --> BatchNorm2d --> ReLU
Conv2d(in: 32 channels, out: 64 channels): filter size=3x3, padding=1 --> MaxPool2d: filter size=2x2 --> BatchNorm2d --> ReLU
Conv2d(in: 64 channels, out: 128 channels): filter size=3x3, padding=1 --> MaxPool2d: filter size=2x2 --> BatchNorm2d --> ReLU
Conv2d(in: 128 channels, out: 128 channels): filter size=3x3, padding=1 --> MaxPool2d: filter size=2x2 --> BatchNorm2d --> ReLU

Decompression/Deconvolution:
ConvTranspose2d(in: 128 channels, out: 128 channels): filter size=3x3, stride=2, padding=1 --> BatchNorm2d --> ReLU
ConvTranspose2d(in: 128 channels, out: 64 channels): filter size=3x3, stride=2, padding=0 --> BatchNorm2d --> ReLU
ConvTranspose2d(in: 64 channels, out: 32 channels): filter size=3x3, stride=2, padding=0 --> BatchNorm2d --> ReLU
ConvTranspose2d(in: 32 channels, out: 16 channels): filter size=3x3, stride=2, padding=0, output_padding=1 --> BatchNorm2d --> ReLU
Conv2d(in: 16 channels, out: 5 channels): filter size=3x3, padding=1

Hyperparameters

I didn't change the default hyperparemeters because they worked well, except for batch_size for faster training.

batch_size: 10
optimizer: optim.Adam, learning rate = 0.001, weight decay = 0.00001
loss function: nn.CrossEntropyLoss
training epochs: 10

Results

Loss

The train and validation accuracies during the training process for 20 epochs. We can see overfitting begin around 10 epochs, so for the final model I only trained it for 10 epochs.

In [53]:
im1=plt.imread("part2/train_val_loss.jpg")
fig = plt.figure(figsize=(15,15))
ax1 = fig.add_subplot(1,1,1)
ax1.imshow(im1)
plt.axis("off");

Average Precision

Class AP Score
others 0.7194
facade 0.7930
pillar 0.1864
window 0.8590
balcony 0.6684
Average 0.6453

Results on own images

I picked a few images of buildings in Berkeley to run the model on. For Evans and Dwinelle, the model does a very good job at labeling the windows, facade and even gets the balcony on Dwinelle correctly. For these two buildings I'd say it did pretty well overall. For Doe, it doesn't do such a good job. The windows are labeled facade and most of the pillars are labeled window rather than pillar. The windows are very dark, and it's hard to see the individual panes, so the net could have had a harder time discerning the area. For the pillars, the shadows may have caused the model to misinterpret them.

In [57]:
fig=plt.figure(figsize=(10, 10))
columns = 2
rows = 3
images = ["doe.jpg","doe_label.png","evans.jpg","evans_label.png","dwinelle.jpg","dwinelle_label.png"]
for i in range(0, columns*rows):
    img = plt.imread("part2/" + images[i])
    fig.add_subplot(rows, columns, i+1)
    temp = incorrect[i].split("_")
    plt.title(images[i].split(".")[0])
    plt.axis('off')
    plt.imshow(img, cmap='gray')
plt.show()
In [ ]: