Facial Keypoint Detection with Neural Networks

We used a VGG model pre-trained on the ImageNet as our base model to train with in this project. The model uses 16 convolutional layers and 5 pooling layers (average instead of max) of the 19 layer VGG-Network. I trained the images with an alpha/beta ratio of 1e-4 in general, and ~50 epochs for each set of photos.

As the paper notes, we used two types of loss functions (style and content) across 6 different layers. It was recommended to use layers: conv1_1, conv2_1, conv3_1, conv4_1, and conv5_1 as style representations and layer conv4_2 as the content represention. As for loss functions, we use two different squared-error loss functions:

1) Content: p is the original image, x is the generated image, P and F correpond to their feature representations in layer l.

Content Squared-Error Loss Function

2) Style: a is the original image, x is the generated image, A^l and F^l correpond to their feature representations in layer l, Nl is the number of distinct filters in layer l, and Ml is the size of the feature maps in each Nl (height times width of the feature map). In this function, we minimize the mean-squared distance between entries of the Gram matrix for the original and generated image (what El represents). And then we add together all of the style layers (i.e. conv1_1, 2_1, 3_1, ...) to achieve the final style squared-error loss function (the wl's are weights applied to each layer and are set to 1/5 in the paper (value of 1 evenly split between 5 active layers)).

Style Squared-Error Loss Function

3) Final Loss function: We add the two separate loss functions above with two weights, alpha and beta. The paper details the ratio alpha/beta should vary between 1e-3 to 1e-4 depending on the painting/photo as a ratio of 1e-4 applies more emphasis to the content of the painting whereas a ratio of 1e-3 gives more emphasis to the content of the photo.

Total Loss Function

Here is some of the art we have made

Failure/Success Cases

My favorite result from the neural net has to be the amalgamation of the campanile with starry night by van gogh. The transfer is more subtle than some of the other pieces but I'm a fan of the 'brushstrokes' and nostalgic-ness. I believe this one came out well because the setting and colors present in the photo have a nice transfer to the color palette in the painting. My least favorite combo has to be the last one, a picture of me mixed with van gogh's blossom painting. The painting fails to many defined characteristics so many details of the photo get blurred out (such as my face) so the amalgamation doesn't come out very well. While the background behind me is pretty, my face/hands look jumbled. I found that having similarness in the paintings and photos can help produce better results as seen in the 1st set of images of a path lined with trees in the fall season - there is a natural transfer from the painting to the photo that makes their intertwine believable and easy to understand.

What we learned

This was one of the coolest projects in class thus far, in my opinion. I love the idea of combining art and photography that showcases popular and well-known art pieces with the content of photos that I can relate to. It's brilliant that neural networks could be used to produce such interesting and beautiful results. Looking forward to the day where all of our art is produced by some neural network training on the past/present art of the human species.

Project 2: Poor Man's Augmented Reality

For this project, we will be inserting a synthetic cube into a video by using a camera projection matrix to project 3D coordinates of the cube into 2D points.

Setup

To setup, we used a box of dimension 5.01" x 7.06" x 2.56", and drew a regular pattern on the box. Below is our input video:

Keypoints with known 3D World Coordinates

We chose 30 points from the pattern, and labeled their corresponding 3D points using the measurements from the box. We marked the points in the first image using plt.ginput. Below is the 30 points we selected, as well as our x,y,z axis and the corner where the axis is centered at.

Propagating Keypoints to other Images in the Video

To propagate our keypoints from the first image to the rest of the images we used the MedianFlow tracker: cv2.legacy.TrackerMedianFlow_create(). We initialized 30 trackers, one for each point. We then used an 8x8 bounding box centered around each keypoint. After each frame of the video we updated each tracker, and kept track of the points in the middle of the bounding box, as well as their corresponding 3D points. Below is the result of our tracked points:

Calibrating the Camera

To calibrate our camera, we needed to solve for the camera projection matrix to be able to project our 3D coordinates to 2D coordinates. We used the following homogeneous linear system from lecture to make our matrices:

We reorganize our matrices to be in the form Ax=b, as shown below, and solve for the projection matrix P by using the least square function np.linalg.lstsq.

Projecting a Cube in the Scene

Once we have our camera projection matrix P, we can use it to project a cube into every frame in our video. All we have to do is perform matrix multiplication between P and the 3D coordinates of our cube, and pass in our projected points into the draw function from this tutorial. The axes points we chose for our cube were: [[1.75,0,2.5], [1.75,1.67,2.5], [3.5,1.67,2.5], [3.5,0,2.5], [1.75,0,3.75],[1.75,1.67,3.75],[3.5,1.67,3.75],[3.5,0,3.75]]. Below is the final result of our projected cube.

What We Learned

In this project, we learned that we could project 3D coordinates into 2D coordinates by solving for the camera projection matrix, which is super cool. Although this project had its tough moments, it was still pretty fun being able to project an object into a scene and get a taste of augmented reality.

Project 3: Image Quilting

Overview

In this project, we had the task of implementing the image quilting algorithm used for texture synthesis and transfer found in this paper by Efros and Freeman.

Randomly Sampled Texture

In this first part of the project, we created a function called quilt_random that at random, samples patches from a sample image passed in and uses those samples to create an output image. This algorithm works by starting at the upper-left corner of the image, and samples square patches until our output image is filled. Below is the result from this simpler algorithm:

Overlapping Patches

For this part of the project, we are no longer placing patches side by side, instead we are overlaying the patches based how similar the overlapping patches are. We calculate a cost image for every possible patch from the source image, which is the SSD of the overlapped regions of the existing and sampled patch. We were able to optimize this operation by using a cv2 function that performs a convolution of our masked template across the whole image which allowed us to forego the naive double for loop and save us 10-20 minutes per execution. After we calculated the cost of each patch, we randomly selected a patch from the list of potential patches whose cost is below a threshold(tol) that we select (and to save execution time later on, we only choose from the lowest K = 10 thresholded values). Below is our result from overlapping patches:

Note: there is a black border around our quilted image because the patch_size was greater than the black filled around the image. Opposed to cropping this out, we chose to maintain the out_size of our resulting image.

Seam Finding

Even though we improved our results by overlapping patches, we could still improve our results by removing edge artifacts near the overlapping patches by using seam finding. To do this, we calculated bndcost, which is the SSD of the overlapping patch region between the output image and the newly sampled patch. We then passed in this bndcost into the cut function provided and it gives us the min-cost contiguous path from the left to right side of that patch (in the form of a mask). We then apply the mask to our patch and its inverse to our output image in order to know which pixels to copy over from the newly sampled patch. Finally, we created a function quilt_cut that builds on our quilt_simple function from the last section by adding seam finding to remove edge artifacts. Below is the results that were produced from using seam finding, as well as examples of overlapping patches, cost image, and min-cost path.

Note: the patch sampled was the same patch present in the overlap (due to low tolerance most likely) so the cost is zero (black) in the overlap region

Comparison of Random vs Overlapping vs Seam Finding Quilting

Texture Transfer

For the final part of the project, we implemented a function called texture_transfer that is able to use the textures of a sample image to replicate a target image. The idea behind texture transfer is to extend the synthesis algorithm used for overlap/seam finding but mandate each patch satisfies a 'correspondence map'. We calculate the ssd cost as before for all patches to make sure they adhere to the overall texture of the image but now each patch must also factor in the ssd of that spatial patch in the external image we are matching to (i.e. patches more similar to the external image will have lower costs (depending on choice of alpha)).

Texture_transfer is similar to quilt_cut, except now we calculate and use another cost term, which is the difference between our sampled source patch and the target patch. Below is the results of our texture_transfer function:

Bells and Whistles

Iterative Texture Transfer

For bells and whistles, we implemented the texture_transfer function iteratively by following the iterative texture transfer method described in the paper. Here is the result of the iterative texture transfer method.

As you can see, in general, the iterative teture transfer has much better/appealing results. We used N = 3 iterations, patch_size initially = 81, overlap initially = 31, tol = .1, and alpha = (.8 * ((j - 1) / (N - 1)) ) + .1 and patch_size/overlap are reduced by a factor of 2 each iteration.

Blending With Texture Transfer

Another bells and whistles we did was combining (iterative) texture transfer and blending to create an image similar to the face-in-toast image showed at the top of the project spec. We re-used the Laplacian/Gaussian stacks from project 2 and a custom mask made with Gimp to utilize Laplacian blending, an old friend.

What We Learned

Overall, this project was a lot of fun because we were able to implement an algorithm from a paper written by our Professor Efros! In this project, we learned various ways to produce image quilts and how to improve the accuracy of our image quilts, such as overlapping similar patches and finding seams to remove any edge artifacts in these overlapping patches. Producing texture transfer images was also a really cool experience because we were able to see that by adding an additional cost term to our quilt_cut function, we could produce these super cool texture transfer images.

Facial Keypoint Detection with Neural Networks

CS194-26: Intro to Computer Vision and Computational Photography

Authors: Michael Sparre and Alfredo Santana

A Neural Algorithm of Artistic Style Overview

Neural Net Architecture

Here is some of the art we have made

Failure/Success Cases

What we learned

Project 2: Poor Man's Augmented Reality

Setup

Keypoints with known 3D World Coordinates

Propagating Keypoints to other Images in the Video

Calibrating the Camera

Projecting a Cube in the Scene

What We Learned

Project 3: Image Quilting

Overview

Randomly Sampled Texture

Overlapping Patches

Seam Finding

Comparison of Random vs Overlapping vs Seam Finding Quilting

Texture Transfer

Bells and Whistles

Iterative Texture Transfer

Blending With Texture Transfer

What We Learned