Part 1 is focused on finding an accurate nose tip. The project casts the nose detection problem as a pixel coordinate regression problem, where the input is a single grayscale image, and the outputs are the nose tip positions (x, y). In practice, (x, y) are represented as the ratio of image width and height, ranging from 0 to 1.

The model architecture that I used is below, which consists of 3 convolutional layers in equal splits (n=12) from 1-32. The first layer has a kernel size of 5x5 and the remaining layers 3x3.

To calculate the fully connected input dimensions I printed the shape of the network after the last pooling layer: 1536 (# Channels x H x W).

```
Network Architecture:
Net_(
(conv1): Conv2d(1, 12, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(conv2): Conv2d(12, 24, kernel_size=(3, 3), stride=(1, 1))
(conv3): Conv2d(24, 32, kernel_size=(3, 3), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc1): Linear(in_features=1536, out_features=300, bias=True)
(fc2): Linear(in_features=300, out_features=2, bias=True)
)
```

The purpose of this part of the assignment is to conduct full face keypoint detection using a larger image size (I picked 160x120 px).

Because we are using bigger images, of course we need more convolutional layers. This time I implemented a 5 layer CNN, still with a max pooling layer.

Some design choices include: -Setting the bias parameter to True -Adjusting the kernel size for the first convolutional layer to 7, the 2nd two to 5 and the 2nd to last to 3. -Setting the learning rate to .0005 and increasing the number of epochs to 40

The results of this process were a bit more unstable than I expected. This is likely due to the small amount of traning data and the relative simplicity of the model itself.

As you can see, there is good performance on some examples, but the failure cases are quite extreme.