In this project, we attempt to use neural networks to combine a photo and a painting such that the style of the painting is applied to the photo. Essentially, we transfer the style of the painting onto the content of photo.
We used a VGG model pre-trained on the ImageNet as our base model to train with in this project. The model uses 16 convolutional layers and 5 pooling layers (average instead of max) of the 19 layer VGG-Network. I trained the images with an alpha/beta ratio of 1e-4 in general, and ~50 epochs for each set of photos.
As the paper notes, we used two types of loss functions (style and content) across 6 different layers. It was recommended to use layers: conv1_1, conv2_1, conv3_1, conv4_1, and conv5_1 as style representations and layer conv4_2 as the content represention. As for loss functions, we use two different squared-error loss functions:
1) Content: p is the original image, x is the generated image, P and F correpond to their feature representations in layer l.
Content Squared-Error Loss Function
2) Style: a is the original image, x is the generated image, A^l and F^l correpond to their feature representations in layer l, Nl is the number of distinct filters in layer l, and Ml is the size of the feature maps in each Nl (height times width of the feature map). In this function, we minimize the mean-squared distance between entries of the Gram matrix for the original and generated image (what El represents). And then we add together all of the style layers (i.e. conv1_1, 2_1, 3_1, ...) to achieve the final style squared-error loss function (the wl's are weights applied to each layer and are set to 1/5 in the paper (value of 1 evenly split between 5 active layers)).
Style Squared-Error Loss Function
3) Final Loss function: We add the two separate loss functions above with two weights, alpha and beta. The paper details the ratio alpha/beta should vary between 1e-3 to 1e-4 depending on the painting/photo as a ratio of 1e-4 applies more emphasis to the content of the painting whereas a ratio of 1e-3 gives more emphasis to the content of the photo.
Total Loss Function
My favorite result from the neural net has to be the amalgamation of the campanile with starry night by van gogh. The transfer is more subtle than some of the other pieces but I'm a fan of the 'brushstrokes' and nostalgic-ness. I believe this one came out well because the setting and colors present in the photo have a nice transfer to the color palette in the painting. My least favorite combo has to be the last one, a picture of me mixed with van gogh's blossom painting. The painting fails to many defined characteristics so many details of the photo get blurred out (such as my face) so the amalgamation doesn't come out very well. While the background behind me is pretty, my face/hands look jumbled. I found that having similarness in the paintings and photos can help produce better results as seen in the 1st set of images of a path lined with trees in the fall season - there is a natural transfer from the painting to the photo that makes their intertwine believable and easy to understand.
This was one of the coolest projects in class thus far, in my opinion. I love the idea of combining art and photography that showcases popular and well-known art pieces with the content of photos that I can relate to. It's brilliant that neural networks could be used to produce such interesting and beautiful results. Looking forward to the day where all of our art is produced by some neural network training on the past/present art of the human species.
For this project, we will be inserting a synthetic cube into a video by using a camera projection matrix to project 3D coordinates of the cube into 2D points.
To setup, we used a box of dimension 5.01" x 7.06" x 2.56", and drew a regular pattern on the box. Below is our input video:
Input Video
We chose 30 points from the pattern, and labeled their corresponding 3D points using the measurements from the box. We marked the points in the first image using plt.ginput. Below is the 30 points we selected, as well as our x,y,z axis and the corner where the axis is centered at.
Selected Keypoints
To propagate our keypoints from the first image to the rest of the images we used the MedianFlow tracker: cv2.legacy.TrackerMedianFlow_create(). We initialized 30 trackers, one for each point. We then used an 8x8 bounding box centered around each keypoint. After each frame of the video we updated each tracker, and kept track of the points in the middle of the bounding box, as well as their corresponding 3D points. Below is the result of our tracked points:
Tracked Keypoints
To calibrate our camera, we needed to solve for the camera projection matrix to be able to project our 3D coordinates to 2D coordinates. We used the following homogeneous linear system from lecture to make our matrices:
Homogeneous Linear System
We reorganize our matrices to be in the form Ax=b, as shown below, and solve for the projection matrix P by using the least square function np.linalg.lstsq.
Once we have our camera projection matrix P, we can use it to project a cube into every frame in our video. All we have to do is perform matrix multiplication between P and the 3D coordinates of our cube, and pass in our projected points into the draw function from this tutorial. The axes points we chose for our cube were: [[1.75,0,2.5], [1.75,1.67,2.5], [3.5,1.67,2.5], [3.5,0,2.5], [1.75,0,3.75],[1.75,1.67,3.75],[3.5,1.67,3.75],[3.5,0,3.75]]. Below is the final result of our projected cube.
Projected Cube
In this project, we learned that we could project 3D coordinates into 2D coordinates by solving for the camera projection matrix, which is super cool. Although this project had its tough moments, it was still pretty fun being able to project an object into a scene and get a taste of augmented reality.
In this project, we had the task of implementing the image quilting algorithm used for texture synthesis and transfer found in this paper by Efros and Freeman.
In this first part of the project, we created a function called quilt_random that at random, samples patches from a sample image passed in and uses those samples to create an output image. This algorithm works by starting at the upper-left corner of the image, and samples square patches until our output image is filled. Below is the result from this simpler algorithm:
Sample Image
Input size = 192x192
Randomly Quilted Image
out_size = 320x320, patch_size = 40
For this part of the project, we are no longer placing patches side by side, instead we are overlaying the patches based how similar the overlapping patches are. We calculate a cost image for every possible patch from the source image, which is the SSD of the overlapped regions of the existing and sampled patch. We were able to optimize this operation by using a cv2 function that performs a convolution of our masked template across the whole image which allowed us to forego the naive double for loop and save us 10-20 minutes per execution. After we calculated the cost of each patch, we randomly selected a patch from the list of potential patches whose cost is below a threshold(tol) that we select (and to save execution time later on, we only choose from the lowest K = 10 thresholded values). Below is our result from overlapping patches:
Sample Image
Input size = 192x192
Overlap Quilted Image
out_size = 400x400, patch_size = 39, overlap = 11, tol = .1
Note: there is a black border around our quilted image because the patch_size was greater than the black filled around the image. Opposed to cropping this out, we chose to maintain the out_size of our resulting image.
Even though we improved our results by overlapping patches, we could still improve our results by removing edge artifacts near the overlapping patches by using seam finding. To do this, we calculated bndcost, which is the SSD of the overlapping patch region between the output image and the newly sampled patch. We then passed in this bndcost into the cut function provided and it gives us the min-cost contiguous path from the left to right side of that patch (in the form of a mask). We then apply the mask to our patch and its inverse to our output image in order to know which pixels to copy over from the newly sampled patch. Finally, we created a function quilt_cut that builds on our quilt_simple function from the last section by adding seam finding to remove edge artifacts. Below is the results that were produced from using seam finding, as well as examples of overlapping patches, cost image, and min-cost path.
Seam finding mechanics:
Sampled Patch
Patch sampled
Output Patch
Where the random patch will go in the image (overlap shown)
Cost and Path Image
Cost image and path in red
Note: the patch sampled was the same patch present in the overlap (due to low tolerance most likely) so the cost is zero (black) in the overlap region
Seam finding images:
Sample Image
Input size = 192x192
Seam Finding Quilted Image
out_size = 400x400, patch_size = 39, overlap = 11, tol = .1
Sample Image
Input size = 192x192
Seam Finding Quilted Image
out_size = 400x400, patch_size = 39, overlap = 11, tol = .1
Sample Image
Input size = 192x192
Seam Finding Quilted Image
out_size = 400x400, patch_size = 39, overlap = 11, tol = .1
Sample Image
Input size = 192x192
Seam Finding Quilted Image
out_size = 400x400, patch_size = 39, overlap = 11, tol = .1
Sample Image
Input size = 192x192
Seam Finding Quilted Image
out_size = 400x400, patch_size = 39, overlap = 11, tol = .1
Random
Random placement of blocks
Overlap
Neighboring blocks constrained by overlap
Seam Finding Quilted Image
Minimum error boundary cut
For the final part of the project, we implemented a function called texture_transfer that is able to use the textures of a sample image to replicate a target image. The idea behind texture transfer is to extend the synthesis algorithm used for overlap/seam finding but mandate each patch satisfies a 'correspondence map'. We calculate the ssd cost as before for all patches to make sure they adhere to the overall texture of the image but now each patch must also factor in the ssd of that spatial patch in the external image we are matching to (i.e. patches more similar to the external image will have lower costs (depending on choice of alpha)).
Texture_transfer is similar to quilt_cut, except now we calculate and use another cost term, which is the difference between our sampled source patch and the target patch. Below is the results of our texture_transfer function:
Sample Image
Input size = 192x192
Guidance/Correspondence Image
out_size = 400x400, patch_size = 39, overlap = 11, tol = .1
Texture Transfer Image
out_size = 400x400, patch_size = 39, overlap = 11, tol = .1
Sample Image
Guidance/Correspondence Image
Texture Transfer Image
out_size = 400x400, patch_size = 39, overlap = 11, tol = .1
For bells and whistles, we implemented the texture_transfer function iteratively by following the iterative texture transfer method described in the paper. Here is the result of the iterative texture transfer method.
Warning this texture transfer lw scary
Sample Image
My keyboard
Guidance/Correspondence Image
Our very own Alfredo Santana
Vanilla Image
Our keyboard-ified Alfredo, if you zoom out from the image you can really see the resemblance :o
Iterative Texture Transfer Image
Our keyboard-ified Alfredo, if you zoom out from the image you can really see the resemblance :o
Sample Image
Guidance/Correspondence Image
Vanilla Image
Regular Texture Transfer
Iterative Texture Transfer Image
Iterative Texture Transfer
As you can see, in general, the iterative teture transfer has much better/appealing results. We used N = 3 iterations, patch_size initially = 81, overlap initially = 31, tol = .1, and alpha = (.8 * ((j - 1) / (N - 1)) ) + .1 and patch_size/overlap are reduced by a factor of 2 each iteration.
Another bells and whistles we did was combining (iterative) texture transfer and blending to create an image similar to the face-in-toast image showed at the top of the project spec. We re-used the Laplacian/Gaussian stacks from project 2 and a custom mask made with Gimp to utilize Laplacian blending, an old friend.
Sample Image
Input size = 500x584
Guidance/Correspondence Image
Input size = 649x574
Texture Transfer Image
We used iterative texture transfer to build the image, using params mentioned in paper
Texture Transfer Image
We externally resized this image and the toast to be the same size
Unique Mask
Mask made with Gimp
Toast Transfer Image
Woo we put our cat on toast :,)
Overall, this project was a lot of fun because we were able to implement an algorithm from a paper written by our Professor Efros! In this project, we learned various ways to produce image quilts and how to improve the accuracy of our image quilts, such as overlapping similar patches and finding seams to remove any edge artifacts in these overlapping patches. Producing texture transfer images was also a really cool experience because we were able to see that by adding an additional cost term to our quilt_cut function, we could produce these super cool texture transfer images.