CS194-26 Project 1 (Colorizing the Prokudin-Gorskii photo collection)

Hello, welcome to my webpage xd.

In order to do the initial part of the project, I primarily just followed the suggestions in the spec and the skeleton. Specifically, I used SSD as a score of how much images matched, and then iterated over a predefined window (15, as suggested) of displacements for the R/G channels to see how they best aligned to the blue channel (and used np.roll to do those displacements), and then applied those displacements such that they aligned. Below, you can find my results for the initial .jpg images (cathedral, monastery, tobolsk), with the found displacements found in a chart below the images. I'll elaborate more on the results (and the later parts) afterwards :)

Part 1 (Single-scale) Displacements y x
cathedral (G) 1 -1
cathedral (R) 7 -1
monastery (G) -6 0
monastery (R) 9 1
tobolsk (G) 3 2
tobolsk (R) 6 3

As we can see, the results aren't that bad, with the cathedral & tobolsk actually looking not bad, though monastery is looking pretty crappy. However, I figured that this would be improved by refining the alignment method a little, which I did when implementing the recursive version - the main refinements here were ignoring the borders of the image, so I moved to only considering the middle 64% (given an image of size $H \times W$, I used the middle $0.8H \times 0.8W$ - this fraction was basically off of eye-balling it and experimentally, as are many of the other parameters mentioned later). Additionally, I figured that cropping the border would help with it as well (also mentioned a few times on Piazza), but I'll get to that when I include cropping.

In order to implement the multi-scale version, I created an image pyramid as suggested. The way in which I did that was by creating an array of images, whose contents corresponded to images of fractional resolution (i.e. $H \times W, \frac{H}{2} \times \frac{W}{2}, \frac{H}{4} \times \frac{W}{4}$, etc., with the smallest being at least $64 \times 64$ (this stopping condition was another kind of arbitrary parameter that was finalized based on experimenting and for speed)). In order to make the whole alignment more efficient on the large images, beyond the initial scoring on the smallest resolution, the window of possible displacements was limited to $[-3, 3]$ (as opposed to the old $[-15, 15]$) in both vertical and horizontal directions. The initial scoring was allowed to cover displacements within the window of $[-32, 32]$ (again, kind of arbitrary, but worked fine experimentally, so I didn't see reason to change it). The initial scoring would give some displacement estimate, which would then be multiplied by 2 and then passed to a recursive call on the next higher resolution image, which would continue up to the highest resolution image.

Part 2 (Multi-scale) Displacements y x
emir (G) 49 -124
emir (R) 66 -801
harvesters (G) 59 16
harvesters (R) 123 13
icon (G) 41 17
icon (R) 89 23
lady (G) 51 9
lady (R) 112 11
melons (G) 81 10
melons (R) 178 13
onion church (G) 51 26
onion church (R) 108 36
self portrait (G) 78 29
self portrait (R) 176 37
three generations (G) 53 14
three generations (R) 112 11
train (G) 42 5
train (R) 87 32
village (G) 64 12
village (R) 137 22
workshop (G) 53 0
workshop (R) 105 -12

Actually, I was initially having trouble aligning about half of them (the other half were fine), but this was resolved by doing the aforementioned improvement of only trying to align the middle portion of the each image to exclude the effects of the borders (i.e. considering the middle 64% when doing scoring). Once that refinement was implemented, all of them except for emir ended up looking pretty good - I suspect emir in particular is problematic due to the statement in the spec - "the images to be matched do not actually have the same brightness values (they are different color channels)", which held particularly true for the subject's patterned blue garb. I did end up getting to resolving that (more on that later), but this basically concluded the main parts of the project.

Now I'll start getting into the Bells & Whistles portions. The first thing I did was implement some sort of automatic cropping - I started by applying the Sobel filter (as suggested somewhere on Piazza, I think) in both vertical and horizontal directions, and them aggregating them in a single "image" that basically contained the vertical & horizontal edges of the image. Then I'd look for long vertical/horizontal edges (long defined as some fraction of the length of the image, in this case I settled on 0.5 after playing around experimentally) on the borders of the image (20% close to the edge of the original image - again, the value decided from playing around with different things experimentally), and then choose the ones that would provide the tightest bounding box, and then cropped the rest out. I'd split the given image into the 3 channels, crop them, process & align them, and then finally crop the result as well (to clean things up). Well, this ended up working decently, although clearly there are some improvements that could be done, as some borders remain in some of the images. (This part also took me a significantly longer time than the rest of the project, primarily because I basically had no clue what I was doing and was just guessing/playing around until things looked okay - and fortunately, it wasn't too bad).

Monastery is looking nice now, and it seemed to work well with these jpgs :)

Bells & Whistles (Crop + Multi-scale) Displacements y x
cathedral (G) -2 -3
cathedral (R) -4 -3
monastery (G) -4 2
monastery (R) -7 2
tobolsk (G) -1 2
tobolsk (R) 3 0
harvesters (G) 11 -25
harvesters (R) 30 12
onion church (G) 51 30
onion church (R) 108 41
village (G) 26 -28
village (R) 38 22

Though clearly it isn't perfect, as borders are still evident in some of the larger images, but I'd already spent a significant amount of time on it, and figured my time was probably better spent on other things rather than perfecting this cause, you know, school :/.

Since I was already able to generate an edge-image of the original images for the cropping part, I figured I could just use that in alignment as well, and hoped that this would be useful especially in resolving the emir image, since the edges weren't too different (at least relative to color differences) across the channels. Fortunately, this ended up working decently (though you can see that it doesn't really improve the already-aligned-pretty-well images, which is probably fine cause they're... already aligned pretty well...). So basically, I'd take the image of edges of the different channels, and then go through the same scoring methods defined above (test displacements across image pyramid), and these ended up being my results, with emir finally looking decent :)

More Bells & Whistles (Gradient-alignment + Crop + Multi-scale) Displacements y x
cathedral (G) -2 -3
cathedral (R) -4 -3
monastery (G) -4 2
monastery (R) -7 2
tobolsk (G) -1 2
tobolsk (R) 3 0
emir (G) 49 -12
emir (R) 107 0

These are some more examples from the site. Here are the links to where I got them:
https://www.loc.gov/pictures/collection/prok/item/2018678773/
https://www.loc.gov/pictures/collection/prok/item/2018678779/
https://www.loc.gov/pictures/collection/prok/item/2018678806/

They seem pretty good, relative to the results that are on the site :)

More Examples (Gradient-alignment + Crop + Multi-scale) Displacements y x
cross thing (Ex 1) (G) -25 -22
cross thing (Ex 1) (R) -33 -5
some building (Ex 2) (G) -6 -17
some building (Ex 2) (R) -30 50
other building (Ex 3) (G) 13 42
other building (Ex 3) (R) 49 19