CS 194-26 Final Project
name: Andrew Shieh
#1: Neural Style Transfer
Background
I've always wondered what I'd look like if some famous painter had drawn
a portrait of me. Thanks to the power of convolutional neural nets
(something I played around with in project 5), I've been able to do so
with neural style transfer, detailed in "A Neural Algorithm of Artistic
Style" by Gatys et. al. In this project, I wanted to implement this
algorithm myself.
For visual content, here's an example I found online: starting with the
two left images, we can generate the image on the right.
Images
First, I picked out two pictures: one content image and one style image.
With these images, the network can learn the "content" of a picture, the
"style" of the other, and combine them together into one. I picked out
an example to imitate from the paper, using the Neckarfront in Tubingen
as my content image and Starry Night as my style image.
To make the network run I first transformed them through a dataloader to
resize both images to the same size. Next, I defined the content loss
and style loss as indicated in the paper. These loss functions help hone
the model so it can pick out and optimize the styles and contents from
each picture. The content loss consists of an MSE over the input image
and content image while the style loss is an MSE over a gram matrix and
style image.
Model
Next, I had to build the convolutional neural net itself. Following the
paper, I used a pretrained VGG-19 model (which has 19 layers) to run my
images on. The paper specified that they found replacing the max pool
layers with average pool layers and not running it through the dense
layers resulted in better looking images, so I did the same with my
network. All in all, my final network architecture looked like:
Style Transfer!
Finally, I ran my content and style images through the network! Here are
my two input images alongside the final result:
Pretty cool! Many of the styles from Starry Night are present
(colorscheme, wavy brushstrokes, starry sky), and I noticed that it even
added the reflection of a star along the river bank.
An alert reader might notice that my final image isn't as vivid as the
examples shown in the paper. This is most likely due to me not running
the model for very long as training on my images (which were 512 px in
height) for 500 epochs already took more than 30 min, despite me still
seeing decreases in loss across the later epochs.
Gallery
Anyhow, I wanted to run some more examples from the paper as well.
Below, behold Neckarfront in the style of "The Shipwreck of the
Minotaur" and Neckarfront in the style of "Composition VII". I show the
content image, style image, and final image for these examples:
These were all pretty good, and again although not as vivid as in the
papers, we can still clearly see the styles being transferred into the
input image.
For a final fun experiment, I decided to transfer the style from "The
Scream" onto my Facebook profile picture. Have a look:
I look a little demonic so I probably won't be making this my new
profile picture. However, the influence of "The Scream" on the colors
and wavy lines can be seen, especially in the background of the picture.
Reflection
Overall, this project was quite a bit of fun. Re-implementing a popular
paper with some cool results to show for myself was eye-opening, and it
was truly astonishing to see the power of CNN's firsthand. I also was
able to explore PyTorch in-depth, and used some parts of their
documentation for guidance on this project.
#2: Image Quilting
Background
Image quilting is an algorithm described in a paper co-authored by Prof.
Efros, so I thought I'd do some exploration into it! It is sort of like
a precursor to neural style transfer, where we can generate an "image
quilt" of specified size from a small image, then put images within that
image quilt. The cool thing is that this paper was published 20 years
ago, but the results can hold up to modern-day AI techniques.
Randomly Sampled Texture
First, I started by randomly sampling textures from the original texture
to generate an image quilt. Below is the input image followed by a
randomly quilted image:
Doesn't look so great, but that's expected given the lack of nuance
within this quilting method. There are very many obvious gaps and seams
within the generated quilt, so a more sophisticated technique needs to
be used to generate it.
Overlapping Patches
This leads us to the next method of quilting, where we introduce the
idea of overlapping patches. In essence, we will still sample patches,
but this time we devise a sense of cost and thresholding (computed with
sum of squared differences) to pick patches that are the "best" fit for
that position. We also overlap the patches slightly so that the
transitions are smoother. This creates better pictures, although it is
tougher to implement. Have a look at the original image, random quilt,
and this method:
This is quite a bit better than the naive randomization: we can see that
the text is much more aligned, although we can still see some of the
seams. One more improvement on this method will help us generate better
and more natural looking quilts.
Seam Finding
Now, for the method described in the paper. Instead of just simple
overlapping, we can devise a cutting function to help us generate the
best "rip" of each patch to paste alongside each other. To explain in
different words, rather than a simple square cut, we can have cuts that
have more jagged edges that can further reduce our "cost" when
overlapping each patch.
Here's a visualization of the calculation going on. First is the
template, then the patch, the cost visualization on the patch, and the
cut chosen. Notice how the cut navigates around the cost areas.
This cut method gives us quite good results, so take a look at the
original image, the random quilt, the simple quilt, and this method
alongside each other:
A watchful eye might be able to tell that the final method isn't perfect
since the text doesn't make much sense and a small seam might be
visible. However, this method overall clearly provides the best
quilting.
Gallery
Just seeing it on this one example is a little boring; I ran the same
algorithm on a couple of different provided sample images and my own
images as well. Each is displayed in the order of original image, random
quilt, simple quilt, and cut quilt:
Texture Transfer
Lastly, we can take our image quilt and transfer a target image into it.
For example, the image on the homepage of a person's face in a loaf of
bread.
The idea here is the same as quilting, except we use the image's
luminance as our correspondence map as another cost function. This is in
conjunction with the alpha variable to determine how much to weight each
cost when creating our overall cost map.
Here's an image of Richard Feynman alongside him in a wall of bricks.
You can sort of make out his face and hair, shown in the wall itself. I
wanted to do another example with a portrait of myself in the bricks as
well. Below is a picture of me, and me in the synthesized brick wall:
This one is a little more hazy since there is so much going on in the
source image. However, you're still able to make out the lines and
angles from the background in the wall itself.
Bells and Whistles: Iterative Transfer
I implemented the iterative method within my code. Thus, the two
examples are the same as above.
Reflection
This project was quite challenging, as it was hard to align everything
as well as get all parts of the algorithm down since there were so many
moving parts. However I quite enjoyed being able to implement something
from a paper my professor had written, and it was really awesome to
compare this method to the neural style transfer I also implemented, as
both techniques can be used to generate images that blend styles and
content together. Definitely a thought-provoking final project and a
good way to end the semester.