CS194-26: CS194-26: Image Manipulation and Computational Photography

Project 4 – Classification and Segmentation

Dimitrios Vlachogiannis (St. ID: 3034311700)

Part 1: Image Classification

Data

For the first part of the project, the FashionMNIST was employed. Indicatively, below several images are visualized along with their labels:

Implementation and Training Details

A convolutional neural network was used through PyTorch. The CNN uses a convolutional layer, max pooling layer and Rectilinear Unit as non-linearity. The architecture of the final neural network includes 2 convolutional layers, 32 channels each, where each convolutional layer is followed by a ReLU followed by a maxpool. In the end 2 fully connected layers are implemented. ReLU is applied only after the first fully connected layer.

Having tested different architectures and tuned hypermeters like batch size (chosen value 64) and learning rate (chosen value 0.001), we plot the training and validation accuracies.

The per class accuracies are displayed in the following table:

The hardest classes to predict proved to be the shirt related ones, namely T-Shirts, Pullovers, and Shirts.

Below are plotted 2 correctly classified images from each class:

Additionally, 2 examples of incorrectly classified are plotted from each class:

Finally, the learned filters are visualized, 32 for each of the 2 channels:

Channel 1

Channel 2

Part 2: Semantic Segmentation

Model Architecture

The architecture of the model is the following:

#########################################

nn.Conv2d(3, 64, 4, padding=2),

nn.BatchNorm2d(64),

nn.Conv2d(64, 128, 4, padding=2),

nn.BatchNorm2d(128),

nn.ReLU(inplace=True),

nn.MaxPool2d(2),

nn.Conv2d(128, 256, 4, padding=2),

nn.BatchNorm2d(256),

nn.Conv2d(256, 512, 3, padding=2),

nn.BatchNorm2d(512),

nn.ReLU(inplace=True),

nn.MaxPool2d(2),

nn.ConvTranspose2d(512, 256, 3,stride=2,padding = 1),

nn.ConvTranspose2d(256, 128, 3,padding = 2),

nn.ReLU(inplace=True),

nn.ConvTranspose2d(128, 5, 4,stride=2,padding=2))

#########################################

After testing for different values, the learning rate utilized was 10^-3 with a weight decay of 10^-5 while the batch size was chosen to be 8.

Training and Validation Loss

Average Precision

The Average Precision on all classes achieved was 56.29%.

Example

The model seems to be doing decently (façades and balconies) and in some cases really well (windows) in all categories but the pillar ones. Also, the other category could be doing better for areas like the sky that do not correspond to any architectural style.