Project 4 | Gary Yang

Part 1: Nose Tip Detection

I used the pytorch dataset class and dataloader method to load the data and the groundtruth keypoints into the pipeline, here are sampled image from my dataloader visualized with ground-truth keypoints.

The network is defined to have 3 convolutional layers with ReLU activation and stride 1 followed by max pooling layers with stride 2, and 2 fully connected layers. I used Adam optimizer with a learning rate of 1e-3 and train for 25 epochs with batchsize 1.
The convolutional layers have out channels 32, 24, and 12 with kernel size 7x7, 5x5 and 3x3 respectively. Below is the network construction and the train-validation loss graph:

class NoseNet(nn.Module):

    def __init__(self):
        super(NoseNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 7)
        self.conv2 = nn.Conv2d(32, 24, 5)
        self.conv3 = nn.Conv2d(24, 12, 3)
        self.fc1 = nn.Linear(12 * 4 * 7, 6 * 4 * 7)
        self.fc2 = nn.Linear( 6 * 4 * 7, 2)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = F.max_pool2d(F.relu(self.conv3(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

Here are 2 examples that worked and 2 that did not. I think the reason the model failed is that it could not detect noses on angled faces. You can see that for the same person, when the face is angled, the model consistently fails. The left two are the good examples and the right two are the failed ones.

Part 2: Full Facial Keypoints Detection

class FaceNet(nn.Module):

    def __init__(self):
        super(FaceNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 128, 3)
        self.conv2 = nn.Conv2d(128, 64, 3)
        self.conv3 = nn.Conv2d(64, 32, 3)
        self.conv4 = nn.Conv2d(32, 16, 3)
        self.conv5 = nn.Conv2d(16, 16, 3)
        self.conv6 = nn.Conv2d(16, 16, 3)

        self.fc1 = nn.Linear(16 * 9 * 5, 8 * 9 * 5)
        self.fc2 = nn.Linear(8 * 9 * 5, 116)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = F.max_pool2d(F.relu(self.conv3(x)), 2)
        x = F.max_pool2d(F.relu(self.conv4(x)),2)
        x = F.relu(self.conv5(x))
        x = F.relu(self.conv6(x))
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:] # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

Here are 2 examples that worked and 2 that did not. I think the reason the model failed is that it is not good at detecting rotated faces and faces that are tilted upwards. Lack of data is also one reason since we only have so many images in the IMM dataset. The left two are the good examples and the right two are the failed ones.

Part 3: Train With Larger Dataset

Bells & Whistles

Auto Morph

I incorporated the keypoint detection into project 3's morph function and generated a video of the 4 images of myself shown in part 3 above morphed. Here is the link to the video if the below is not loading. It is implemented by changing the load_points function of project 3 to use the model trained in part 3 to detect keypoints and return them.

Anti-aliased max pool

Using anti-aliased max pool to train the network in part 2 on 300 images from the iBug dataset and validating on 50 and doing the same without anti-aliased max pool shows me that when using anti-aliased max pool, the training tends to go smoother but slower. Below are the two train-validation loss graphs, the left being training with anti-aliased max pool and the right being training without anti-aliased max pool. The graphs are drawn for train and validation losses of 20 epochs, the x-axis got set automatically for less cluttered viewing.