CS 194-26 Project 1

In this project, we synthesized color images from the Prokudin-Gorskii collection from their individual color channels using different alignment techniques. Each file in the collection consists of an image's blue, green, and red channels as vertically stacked cells, but dividing into thirds and superimposing doesn't always work because the cells aren't always perfectly lined up.

Approach

The first function contains the scoring metric. I used sum of squared differences.

Next, given a reference image and a top image to align over it, the single-scale alignment implementation performs an exhaustive search of each displacements between a range of row offsets and a range of column offsets. The default search window for both of these offsets is -15 pixels to +15 pixels. For each (row, column) offset, the function computes the SSD score. In this function, I found that computing score over the center of the image rather than the entire image (which might contain solid-color borders) yielded better results, so a border the size of the largest offset is cropped off of each image prior to calculating SSD. Lastly, the offset with the highest score is returned.

Finally, coarse-to-fine pyramid alignment speeds the algorithm up by downsizing both the reference and top images recursively, then running single-scale alignment starting from the smallest image pair and using the resulting displacement to restrict the search window for the next-largest image pair's alignment (to only the equivalent patch of pixels). This significantly reduces the total number of searches that must be performed. To write this, I had to change the single-scale alignment function to accept parameters for the ranges of row and column offsets to search through. In this implementation, I used a 5-level pyramid that downsizes images by 2x at each level.

Bells and whistles

I implemented automatic cropping of solid-color borders off of the final images. The notebook contains an autocrop function that takes in the red and green channel displacements to crop out any rolled-over borders, and then takes in the aligned red, green, and blue channels themselves. For each channel, it iterates inward starting at the top- and bottom-most rows and left- and right-most columns of pixels, and marks the first ones that appear to contain actual image content as opposed to a solid color border, then crops the image to the outermost coordinates that don't represent borders in any channel.

The helper function for distinguishing between a solid-color border row/column and one with image content checks that those pixels have a variance that exceeds a certain floor and a mean that falls within a certain range. After examining the variances and means of different images, I manually adjusted the parameters and found the best-performing ones to be a variance floor of 0.01 and a mean range of (0.1, 0.98) for rows and (0.4, 0.98) for columns.

Results

Below are results of single-scale alignment on each provided JPG image and pyramid alignment on each TIF image. The former takes 1-2 seconds per JPG. The latter takes 16-17 seconds per TIF.

Some of the results have slightly noticeable channel offsets, but most of them can be perceived as cohesive color images. Running the algorithm on Prokudin-Gorskii's self portrait produced a much poorer alignment than the other images - specifically, the green channel offset seems to be too far to the right. Looking at the input image, this error could be due to a large white patch that appears in the bottom-right of the green color channel, which might have affected the SSD calculation.

Each example below juxtaposes the results of the algorithm before and after automatic cropping.

Example results & offsets

Low-resolution images

cathedral.jpg - green offset (5, 2), red offset (12, 3)	cathedral-cropped.jpg - cropped between (12, 11), (337, 370)
monastery.jpg - green offset (-3, 2), red offset (3, 2)	monastery-cropped.jpg - cropped between (10, 20), (334, 376)
tobolsk.jpg - green offset (3, 2), red offset (6, 3)	tobolsk-cropped.jpg - cropped between (8, 20), (335, 381)

High-resolution images

castle.jpg - green offset (35, 3), red offset (99, 3)	castle-cropped.jpg - cropped between (99, 64), (3248, 3674)
emir.jpg - green offset (42, 14), red offset (108, 26)	emir-cropped.jpg - cropped between (108, 109), (3200, 3641)
harvesters.jpg - green offset (58, 10), red offset (126, 13)	harvesters-cropped.jpg - cropped between (126, 75), (3189, 3594)
icon.jpg - green offset (41, 16), red offset (90, 23)	icon-cropped.jpg - cropped between (90, 92), (3238, 3675)
lady.jpg - green offset (55, -6), red offset (113, -4)	lady-cropped.jpg - cropped between (113, 91), (3203, 3649)
onion_church.jpg - green offset (52, 22), red offset (108, 36)	onion_church-cropped.jpg - cropped between (122, 143), (3210, 3672)
self_portrait.jpg - green offset (77, -1), red offset (176, 34)	self_portrait-cropped.jpg - cropped between (176, 195), (3250, 3701)
three_generations.jpg - green offset (52, 5), red offset (111, 9)	three_generations-cropped.jpg - cropped between (111, 101), (3207, 3622)
train.jpg - green offset (41, -2), red offset (93, 26)	train-cropped.jpg - cropped between (93, 173), (3224, 3646)
workshop.jpg - green offset (53, -4), red offset (102, -12)	workshop-cropped.jpg - cropped between (102, 58), (3208, 3641)

Chosen results & offsets

Here are four other images from the Prokudin-Gorskii collection. These are 2 TIF files of images called Pīony, Ėtrusskīi︠a︡ vazy. V Ėrmitazhi︠e︡ v SPeterburgi︠e︡, and 2 JPG files of a mostly-green image called Cheremukha, at a smaller (~50 KB) and larger (~200 KB) size. Below are the results of my code on these images.