CS 194-26 Project 4 [acc id: aez]
Overview
Part 1: Image Classification
- Dataloader and CNN: Used
torch.utils.data.DataLoader
and torch.nn.Module
respectively
- Loss function and Optimizer:
nn.CrossEntropyLoss
and optim.Adam
respectively. See parameters below.
CNN model specifics
- Dataset sizes
- Training = 40000
- Validation = 5000
- Test = 5000
- Hyperparameters
- batch_size = 32
- n_epochs = 10
- learning_rate = 0.001
- Layers
- Specified in spec: 2 conv layers with 32 channels each, followed by ReLU and max pool.
Results
- Train and validation accuracy
- Accuracy
- Overall
Accuracy of the network on the 60000 train images: 93.19 %
Accuracy of the network on the 5000 validation images: 91.34 %
Accuracy of the network on the 10000 test images: 90.52 %
- Per-class
Class Accuracy (%)
0 85.40
1 98.10
2 88.90
3 94.20
4 82.10
5 96.90
6 67.00
7 98.90
8 98.40
9 95.30
Classified images
Correctly classified |
Wrongly classified |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- Class 6 did the worst, class 7 did the best.
- Class 6 corresponds to shirt, which may be difficult to categorize given that there are other categories that also take on the look of a shirt (t-shirt, pullover, dress, top)
- Class 7 did the best, possibly because it looks much more distinct than other footwear, or that all classes of footwear in this dataset have distinct features.
Visualization of filter
- Filters from the convolution was extracted and plotted, as seen below:
Part 2: Semantic Segmentation
- Dataloader and CNN: Used
torch.utils.data.DataLoader
and torch.nn.Module
respectively
- Loss function and Optimizer:
nn.CrossEntropyLoss
and optim.Adam
respectively. See parameters below.
CNN model specifics
- Dataset sizes
- Training = 800
- Validation = 106
- Test = 114
- Hyperparameters
- batch_size = 8
- n_epochs = 30
- learning_rate = 0.001
- weight_decay = 0.00001
- Layers
- 6 layers of convolution, with maximum channel size of 128, and kernel sizes varying from 3x3 - 7x7.
- ReLU is done after every convolution except for the last
- 3 maxpooling and upsampling of scale factor (2,2) was done after the first three convolutions.
Results
Own image
Input |
Output (output model) |
Output (with 20 epochs) |
|
|
|
- Everything actually looks pretty much in place! The model did well on recognizing general facade, windows and could also pick up pillars at the appropriate places.
- One noticeable shortfall is the identification of balconies. Eventhough all instances of balconies are identified, there were balconies that were identified in the middle of the window.
- This could be due to the dark colour that is present in the middle of the window that led the model to think that it is a balcony (which usually casts a dark shadow around it).
- It is possible that due to the training images having balconies directly underneath the windows, the model is trained to identify balconies very close by repeated windows.
- To test my overtraining on balconies-near-windows hypothesis, I tried training my model with only 20 epochs. The model ended up identifying more balconies on the correct location and less on the windows. However, it has less resolution on other things that the original model did better (windows and facade).
CS 194-26 Project 4 [acc id: aez]
Overview
Part 1: Image Classification
torch.utils.data.DataLoader
andtorch.nn.Module
respectivelynn.CrossEntropyLoss
andoptim.Adam
respectively. See parameters below.CNN model specifics
Results
Classified images
Visualization of filter
Part 2: Semantic Segmentation
torch.utils.data.DataLoader
andtorch.nn.Module
respectivelynn.CrossEntropyLoss
andoptim.Adam
respectively. See parameters below.CNN model specifics
Results
Own image