Project 1: Images of the Russian Empire

CS194-26 Fall 2021 | Rio Hayakawa


Introduction

As early as 1907, Sergei Mikhailovich Prokudin-Gorskii recognized the possibilities of color photography and went on a journey across the Russion Empire record everything he saw as a color photograph. He did this by recording three exposures onto a glass plate through a red, green, and blue filter, where the full color image can be seen by overlapping the three exposures. Although he was never able to accomplish his goals, thankfully his glass plate negatives survived. They were later purchased by the Library of Congress in 1948, which has then been digitized and made available online for the public.

In this project, I implement an algorithm to take these three exposures and automatically aligns them to create a single color image.

Part 1: Single Scale Algorithm

To align the input images, my first approach was to implement an exhaustive search over a range of possible displacement values to find the optimal value in which two images align most closely.

After reading the input image and dividing the image into three separate channels according to the height / 3, the implementation starts by cropping each color image by 20 pixels. It starts by aligning the green channel to the blue channel, where it goes through a double for loop iterating through the possible displacement values (which is set as a range between -15 and 15) on both x and y axes. In each iteration, it rolls the image by the current x displacement value in the x direction and by the current y displacement value in the y direction. It then calculates the mean square error between the rolled g channel and b channel and stores it as well as the current displacement values if it is the best current score. After the double for loop ends, it rolls the g channel by the best displacement value on both axes and saves the g to b aligned image for later. It then repeats that process for the red channel to get the image of the red channel image aligned to the blue channel. With the two color channels aligned to the blue channel, it stacks the aligned red channel image over the aligned green channel image and the blue channel image to save and display the final color photo.

Below are example outputs from running the algorithm on .jpeg images as well as the best displacement values for each color channel and for each axes (x on the left, y on the right). Additionally, 15 pixels were removed off the top to clean up the border artifacts.

cathedral.jpeg monastery.jpeg
r: (5,2) – g: (12,3) r: (-3,2) – g: (3,2)
tobolsk.jpeg
r: (3,3) – g: (6,3)

Part 2: Multiscale Algorithm

At lower resolution images like in part 1, the exhaustive method works well but for higher resolution images like in .tif files, we need a different algorithm that handles large images.

The approach for that case is to utilize an image pyramid to represent the input image at various scales. With each lower resolution image, the implementation calculates the best displacement at that scale and recurses up to the next higher resolution image to calculate the best displacement within the displacement window calculated in the previous step.

The algorithm for multiscale begins with first splitting the input image into the three color channels and cropping 200 pixels from each edge. It then goes into aligning the blue channel to the green channel which is done by calling a recursive function that returns the best x and y displacement values and the aligned image. The recursive function first checks if the either dimentions of both input images, which are the b and g channels in this case, are less than or equal to 400 pixels. If so, it calls a function that is the same as the align function defined in Part 1, with the execption that it returns the x and y displacement values as well as the aligned image. If the base case is not met, then it calls itself (the recursive function) with the b and g channels downscaled by half as the arguments and stores the displacement values and aligned image. It then scales up the returned displacement values by two to account for the downscaling and returns the result of the align method defined above with the specific displacement window in which the downscaled call to align returned. After the final recursive call returns, we get the b channel aligned to the green channel. We then do the same call with the red channel and green channel to the r channel aligned to the g channel. With the two aligned images, we stack the aligned r channel over the g channel over the aligned b channel to save and display the final color photo.

In this implementation, I decided to go with a base case displacement window of -15 to 15 and a scaling factor of 1/2. Additionally, decided to align the channels to green instead of blue like in part 1 because I had issues with aligning ‘emir.tif’, and green alignment fixed that issue.

Below are example outputs from running the algorithm on large images (.tif files) as well as the best displacement values for each color channel and for each axes (x on the left, y on the right). Additionally, 100 pixels were removed off the top to clean up the border artifacts.

church.tif emir.tif
r: (33,-8) – b: (-25,-4) r: (57,17) – b: (-49,-24)
harvesters.tif icon.tif
r: (65,-3) – b: (-59,-16) r: (48,5) – b: (-40,-17)
lady.tif melons.tif
r: (62,4) – b: (-47,-8) r: (96,3 – b: (-82,-9)
onion_church.tif self_portrait.tif
r: (57,10) – b: (-51,-26) r: (98,8 – b: (-78,-28)
three_generations.tif train.tif
r: (58,-1) – b: (-53,-14) r: (43,27) – b: (-42,-5)
workshop.jpeg
r: (52,-11) – b: (-52,0)

Additional Prokudin-Gorskii Collection Image Renders

batum.tif school.tif
r: (59,7) – b: (-46,-24) r: (48,7) – b: (-31,-16)
elder.jpeg
r: (101,-25) – b: (-91,27)