Project 1

Suzie Petryk

Russia, 1907. Sergei Mikhailovich Prokudin-Gorskii travels all over Russia, taking photographs with red, green, and blue filters over glass plates. In this project, the resulting black-and-white negatives can then be aligned to produce color photographs, achieving Sergei's vision.

To run the code, run: python main.py

To run with aligning edges for the bells & whistles portion, run: python main.py --edges

Approach

The alignment process for a single scale approach (for the smaller .jpg images) I used is as follows:

  1. Crop 10% of the image size on the borders to remove border effects
  2. Define x and y offsets to search over as [-crop size, +crop size] (where crop size was 10% of the channel height)
  3. Align red to blue channel, and green to blue channel by looping over x and y shifts. For each shift, calculate normalized cross correlation between channels. Find offsets where NCC is max.
  4. Find intersection common between all R, G, B channels when R and G are shifted. Crop to this intersection and return aligned image.

For the larger .tif files, searching over this offset range would take a very long time to run (never ran to completion before I got impatient and stopped the runs). Instead, I implemented the multi-scale recursive search. I set an initial offset range as 10% of the channel height as before. However, I resized the image and this offset range by half until the image height was at least 128 pixels. The best offset (to max the NCC as in the single scale) was found at this smallest range. Call it (x_offset, y_offset). When the function returned to one level up in the image pyramid, the new offset range was 2x_offset +/- (x_offset/4), 2y_offset +/- (y_offset/4). This drastically reduced the runtime since the search space of offsets was so much smaller. It took an average runtime of 26.4 seconds when running through all the images.

Originally I searched over offset ranges of +/- offset/2, yet this offset window still had long runtimes (consistently over a minute) for the large images. Because the selected offsets were generally close to 2*offset anyway, I narrowed the search window to +/- offset/4.

Failure cases

The multi scale algorithm clearly failed to align the Emir image. It might due to the negatives having different brightnesses and contrasts, making it more difficult for NCC to be an appropriate metric, which just compares raw pixel values. The image of the harvesters also has visual artifacts. There is a visible abrasion on the left hand side of the negatives of this image, which may have thrown off the NCC, again because it is directly comparing pixel values.

Results

Single scale

Cathedral

Offsets: r: (3, 12) g: (2, 5)

cathedral

Monastery

Offsets: r: (2, 3) g: (2, -3)

monastery

Tobolsk

Offsets: r: (3, 6) g: (3, 3)

tobolsk

Multi scale

Emir

Offsets: r: (42, 68) g: (24, 49)

emir

Icon

Offsets: r: (23, 89) g: (17, 41)

icon

Three Generations

Offsets: r: (0, 111) g: (0, 52)

three_generations

Village

Offsets: r: (22, 138) g: (0, 64)

village

Lady

Offsets: r: (0, 117) g: (0, 55)

lady

Onion Church

Offsets: r: (36, 108) g: (26, 51)

onion_church

Workshop

Offsets: r: (-12, 105) g: (0, 53)

workshop

Harvesters

Offsets: r: (0, 124) g: (17, 60)

harvesters

Melons

Offsets: r: (13, 178) g: (0, 81)

melons

Self Portrait

Offsets: r: (37, 176) g: (29, 79)

self_portrait

Train

Offsets: r: (32, 87) g: (0, 42)

train

Extra samples

I downloaded the .tif files of the three separated negatives from this collection. Here are the results:

Former palace of Catherine the Great

Offsets: r: (0, 60) g: (0, 26)

dvortsa

Bridge in village of Lava

Offsets: r: (0, 0) g: (0, 0)

bridge

Guardhouse at Belyie Ozerki

Offsets: r: (27, 28) g: (16, 0)

bridge

Bells and Whistles

To find better offsets, I first used Canny edge detection to convert each channel from raw pixels into an image of its edges. An example of these edges is shown below, on the 3 channels of the Emir image. The rest of the implementation was the same; I used the same original offset range (10% of the image height) and still used NCC to find the best offset. The results look much cleaner, as shown below.

The left images show the original aligned image (from the initial single/multi-scale implementation), and the right images are the ones when pre-processing the images into their edges. The single scale images found exactly the same offsets, and so I do not show them here. I've resized the .tif to (512, 512) just for this webpage, to make them fit side-by-side. However, the script still produces the full-size aligned images. The average runtime for this section for all the images was 20.7 seconds (faster than the original implementation, since the offset windows happened to be smaller during the multi-scale with this preprocessing).

Example of edges

Blue channel of Emir as Canny edges image:

New results

Emir

Offsets: r: (40, 107) g: (24, 49)

Icon

Offsets: r: (23, 89) g: (16, 39)

Three Generations

Offsets: r: (-1, 115) g: (0, 58)

Village

Offsets: r: (21, 137) g: (0, 64)

Lady

Offsets: r: (-1, 121) g: (-2, 58)

Onion Church

Offsets: r: (35, 107) g: (24, 52)

Workshop

Offsets: r: (-12, 105) g: (-1, 53)

Harvesters

Offsets: r: (0, 134) g: (0, 60)

Melons

Offsets: r: (14, 176) g: (0, 76)

Self Portrait

Offsets: r: (37, 175) g: (29, 77)

Train

Offsets: r: (29, 85) g: (0, 41)
In [ ]: