Link Search Menu Expand Document

Samples of Data Loader

Some downsampled and augmented grayscale input images and their labels are illustrated below.

Architecture

We trained a model with the following architecture using AdamW with learning rate = 5e-4, weight_decay = 1e-4 and MSE loss for 50 epoches.

nn.Sequential(
    nn.Conv2d(1, 8, 3),
    nn.BatchNorm2d(8),
    nn.ReLU(),
    nn.MaxPool2d(2, 2),
    nn.Conv2d(8, 16, 3),
    nn.BatchNorm2d(16),
    nn.ReLU(),
    nn.MaxPool2d(2, 2),
    nn.Conv2d(16, 32, 3),
    nn.BatchNorm2d(32),
    nn.ReLU(),
    nn.MaxPool2d(2, 2),
    nn.Conv2d(32, 64, 3),
    nn.BatchNorm2d(64),
    nn.ReLU(),
    nn.MaxPool2d(2, 2),
    nn.Conv2d(64, 64, 3),
    nn.BatchNorm2d(64),
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(4928, 1024),
    nn.ReLU(),
    nn.Linear(1024, 256),
    nn.ReLU(),
    nn.Linear(256, 116)
)

Training and Validation Loss

Results

The following are a few good and bad outputs from our neural network. The images on the right show the generated keypoints. It seems that the network is more sensitive to the shadows around the faces than the actual facial features in some cases, resulting in keypoints that seem way off.

Good Results Bad Results

Features

These are the filters of the first convolutional layer from our network. They largely resemble line detectors.