CS 194-26: Project 5

Vinay Chadalavada

Nose Tip Detection

Ground Truth Images + Labels From Data Loader

Above are three images from the data loader I created with the labels plotted on top of them. They look somewhat blurry since all images were resized to 80 by 60

Training and Validation Loss

My neural network consisted of 3 convolutional layers and I tuned the learning rate and the batch size. The optimal values I found were a learning rate of 0.001 and batch size of 24.

At the end of training my model had a training loss of 0.0013 and a validation loss of 0.0018

Tuning Learning Rate

When the learning rate was 0.01 the final training loss was 0.0037 and the validation loss was 0.0051. When the learning rate was 0.001 the training loss is 0.0013 and the validation loss is 0.0025. When the learning rate was 0.0005 the final training loss was 0.0040 and the validation loss was 0.0032.

Tuning The Batch Size

When the batch size is 12 the final training loss was 0.0022 and the validation loss was 0.0038. When the batch size was 24 the final training loss was 0.0013 and the validation loss was 0.0018. When the batch size is 48 the training loss is is 0.0013 and the validation loss is 0.0025.

Results

Here are some results that look pretty good

Here are few bad ones

The person in the first turned his face a lot and I think that cuased the model to predict wrong. In the second I think the creases caused by the smile confused the model

Full Facial Keypoints Detection

Ground Truth Images + Labels From Data Loader

Above are three images from the data loader I created with the labels plotted on top of them. These is data augmentation so the images can be tilted or shifted

Model Architecture

My model has 5 convolutional layers each followed by a maxPool layer. The first layer takes 1 channel and outputs 32. The second takes in 32 and outputs 64. The third layer takes in 64 and outputs 128. To the reduce the time the model takes to run the last two convolutional layers take in 128 and output 128. The model also has 2 fully connect layers. The first one is of size (1920, 1920) and is followed by a Relu. The final fully connected layer has size (1920, 116) to get the desired output of all the label coordinates. I trained my model at a learning rate of 0.001, a batch size of 48, and for 20 epochs.

Training And Validation Loss

The training loss was 0.0035 and the validation loss was 0.0046 in the final iteration of training.

Results

Here are some results that look pretty good

Here are few bad ones

There seems to be very little variation even if data augmentation was added into the dataset, it seems to very close to average face.

Visualizing the Filters

Train With Larger Dataset

Ground Truth Images + Labels From Data Loader

Model Architecture

The model I used is resnet18, but I had to change the first layer to take 1 channel since my images are grey scale and I also changed the last fully connected layer to output 136 points.

Training And Validation Loss

Results

Here are a few results on my validation set

Results On Other Images I Found

It seems to have a harder time detecting eyes in these images

Kaggle Competition

My username on Kaggle is EitherCod and the MAE I ended with is 186.28956. There is definitely a bug with my code. I tried training on a single batch consisting of two images. When I ran net(two_image_batch)[0] it gave different results than net(two_image_batch[0]). The images in the batches seem to be interacting with each other for some reason.

Refrences

Obama Image From: https://www.thedailybeast.com/obamas-nightmare-reelected-in-2012-but-republicans-take-the-senate