CS194-26 Proj4
Kehan Wang
Part 1
- I implemented a NoseNet - 3 conv layers and 3 linear layers.
-
Built a gray-scale dataset: (sample image)
-
Training and validation loss over 25 epochs:
-
2 images that detects correctly:
-
2 images that doesn’t detect correctly:
I think the reason why these noses are not getting detected correctly is because the majority of the training dataset contains an up-right head position. The portraits face the camera directly most of the time. Our simple network maximizes the chance that a prediction is in the middle to minimize the overall loss because in most scenarios, the nose is in the middle. In this case, when the model sees a sideway face, it fails to predict the location of the nose correctly.
Part 2
-
I implemented a FaceNet - 5 conv layers and 3 linear layers.
-
Sample image from dataset, with random rotation and random crop
-
Training: 25 epochs, MSELoss, 1e-3 learning rate, Adam optimizer
-
2 images that detects correctly:
-
2 images that doesn’t detect correctly:
Same as reasoning for Part 1, laughing faces and sideway faces are not the typical case. Because there are equal amount of sideway to left and sideway to right faces, the MSELoss tries to converge to the midway to minimize all the loss, which ends up giving us the middle keypoints.
- Learned Filters in the conv layers
- Listed in Appendix at the end of the report.
Part 3
- Mean absolute error on Kaggle competition is 7.13936.
- Trained Resnet18 for 100 epochs with Adam. 50 epochs of 1e-3 and then 50 epochs of 3e-4 learning rate. Used Random rotation and random flip for data augmentation.
- Architecture is in the Appendix
It works perfectly on Shaq but fails on Emma. I think it’s because emma’s face is partially covered by her hair, which our network does not recognise.
Appendix
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=512, out_features=136, bias=True)
)
CS194-26 Proj4
Kehan Wang
Part 1
Built a gray-scale dataset: (sample image)
Training and validation loss over 25 epochs:
2 images that detects correctly:
2 images that doesn’t detect correctly:
I think the reason why these noses are not getting detected correctly is because the majority of the training dataset contains an up-right head position. The portraits face the camera directly most of the time. Our simple network maximizes the chance that a prediction is in the middle to minimize the overall loss because in most scenarios, the nose is in the middle. In this case, when the model sees a sideway face, it fails to predict the location of the nose correctly.
Part 2
I implemented a FaceNet - 5 conv layers and 3 linear layers.
Sample image from dataset, with random rotation and random crop
Training: 25 epochs, MSELoss, 1e-3 learning rate, Adam optimizer
2 images that detects correctly:
2 images that doesn’t detect correctly:
Same as reasoning for Part 1, laughing faces and sideway faces are not the typical case. Because there are equal amount of sideway to left and sideway to right faces, the MSELoss tries to converge to the midway to minimize all the loss, which ends up giving us the middle keypoints.
Part 3
Images in Test set
Images from my collection
It works perfectly on Shaq but fails on Emma. I think it’s because emma’s face is partially covered by her hair, which our network does not recognise.
Appendix
Learned Filters
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
Resnet18 Architecture