berkeley logoProgramming Project #4 (proj4)
CS194-26: Intro to Computer Vision and Computational Photography

Due Date: 11:59pm on Friday, Nov 12, 2021 [START EARLY]


Facial Keypoint Detection with Neural Networks

fashion mnist


In this project, you will learn how to use neural networks to automatically detect facial keypoints -- no more clicking! For this project, we suggest using PyTorch as the deep learning framework, and any provided starter or reference code will reflect this. Here are some tutorial videos that might be helpful for you: Neural Networks Demystified and PyTorch in 5 minutes. We include more tutorial links in the resources section. For parts 1 and 2, you should be able to train your models locally. For part 3, we recommend using Google Colab, but for all parts you may use any hardware that works for you. If you choose to use Colab, once you create a new notebook, you need to go to Runtime --> change runtime type and set the hardware accelerator to GPU. Note that one Colab session has an idle timeout for 90 minutes and an absolute timeout for 12 hours, so please download your results/your trained model frequently. Make sure to START EARLY, especially if you are not familiar with PyTorch or Colab - we will not provide additional slip days or deadline extensions. Once you complete the project, please submit the code to bCourses.

Part 1: Nose Tip Detection

For the first part, we will use the IMM Face Database available on this website for training an initial toy model for nose tip detection. The dataset has 240 facial images of 40 persons and each person has 6 facial images in different viewpoints. All images are annotated with 58 facial keypoints. Please use all 6 images of the first 32 persons (index 1-32) as the training set (total 32 x 6 = 192 images) and the images of the remaining 8 persons (index 33-40) (8 * 6 = 48 images) as the validation set. As a reference, the staff solution takes less than 1 minute to train 10 epoches locally.

We will cast the nose detection problem as a pixel coordinate regression problem, where the input is a single grayscale image, and the outputs are the nose tip positions (x, y). In practice, (x, y) are represented as the ratio of image width and height, ranging from 0 to 1.

Part 2: Full Facial Keypoints Detection

We are not satisfied with just detecting the nose tip position - in this section we want to move forward and detect all 58 facial keypoints/landmarks. You need to use the same dataset as Part 1 but now try to load all 58 keypoints and predict them.

Part 3: Train With Larger Dataset

For this part, we will use a larger dataset, specifically the ibug face in the wild dataset for training a facial keypoints detector. This dataset contains 6666 images of varying image sizes, and each image has 68 annotated facial keypoints. You will need to use Colab with GPU to train the model. As a reference, the staff solution takes 1.5 hours to train 10 epoches using Colab. We provide code to download the dataset and unzip it in Colab.

For our class Kaggle competiton: use this link to download the test set xml file. It contains the image path and face bounding boxes but it does not include the keypoints annotation. You will need to predict the keypoints location and submit the result to Kaggle. Please note (1) Do not use data augmentation for your test set dataloader (2) You need to convert your keypoint predictions (ratio of width/height in the crop image) to the absolute pixel coodinate in the entire image. (3) Please save all results into one csv file; the csv file should contain 137088 rows (image_0001_keypoints_01_x, image_0001_keypoints_01_y, image_0001_keypoints_02_x, image_0001_keypoints_02_y, ..., image_0001_keypoints_68_x, image_0001_keypoints_68_y, image_0002_keypoints_01_x, image_0002_keypoints_01_y, ...), each with two columns 'Id' and 'Predicted'.

Bells & Whistles (Extra Points)


[1] Introductory Pytorch Tutorial
[2] Google Colab Tutorial (Using this should be very similar to using an ipython notebook)

Project designed by Zhe Cao and Alyosha Efros