Image Classification and Segmentation

Sean Chen

Part 1: Image Classification

A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
Val Accuracy of the network: 95 %
Test Accuracy of the network: 90 %
Accuracy of T-shirt/top : 84 %
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

Accuracy of Trouser : 97 %
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

Accuracy of Pullover : 89 %
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

Accuracy of Dress : 92 %
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

Accuracy of Coat : 88 %
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

Accuracy of Sandal : 97 %
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

Accuracy of Shirt : 67 %
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

Accuracy of Sneaker : 97 %
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

Accuracy of Bag : 97 %
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

Accuracy of Ankle boot : 95 %
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

A generic square placeholder image with rounded corners in a figure.
Visualizing learned convolution filters in the first convolutional layer

Part 2: Semantic Segmentation

Network architecture
A generic square placeholder image with rounded corners in a figure.
A generic square placeholder image with rounded corners in a figure.

I used a deep convolutional neural net with 6 layers. The first 3 convolutional layers grow the number of channels from 48 to 128 to 192. After each convolution, the image shrinks by a factor of 2. There's also nonlinearities and a batch norm. Then, the last 3 layers are upscaling the network. The ConvTranspose2d upsamples the image by a factor of 2, while keeping the number of channels constant. Then we shrink the channels to eventually get to 5, which is our desired output.

A generic square placeholder image with rounded corners in a figure.
Loss Cross entropy
Optomizer Adam
Learning rate 1e-3
Weight decay 1e-5
Epochs 17
A generic square placeholder image with rounded corners in a figure.
Class Average percision on test set
others 0.6794454505246241
facade 0.7672731673796042
pillar 0.15840029845440098
window 0.847716567883514
balcony 0.49730511712925923

Results

A generic square placeholder image with rounded corners in a figure.
Input
A generic square placeholder image with rounded corners in a figure.
Output
A generic square placeholder image with rounded corners in a figure.
Input
A generic square placeholder image with rounded corners in a figure.
Output