Project 1: Images of the Russian Empire - Colorizing the Prokudin-Gorskii photo collection

I. Overview

In this project, I take digitized Prokudin-Gorskii glass plate images and, through image processing techniques, produce color images by getting the three color channel images and overlay them so they produce a single RGB color image. The main issue that my project tackles is how to align the different color channel images as close as possible so that the image is not distorted by misaligned colors. I first divide the original image containing the three color channel images into three parts. Then, I implement an exhaustive search over a window of possible displacements, score each displacement, and pick the displacement with the best score. After that, I implement an image pyramid to support bigger image sizes (more pixels). Finally, I implement edge detection in order to support images with different brightness values.

II. Exhaustive Search

After dividing the original image containing the three color channel images into three equal parts (R, G, B) using the starter code, I decided to crop the borders of each color channel image by 10% in order to avoid using the border or anything not in the image for the scoring metric. I align the red and green color channel images with the blue color channel image, using the cropped color channel images and the window of displacements (a double for loop from (-15, 15)) to calculate the score of each displacement. The first scoring metric I implemented is the sum of squared differences (SSD) whose formula is sum(sum((image_a - image_b)**2)). The smaller the SSD, the more similar the images are. Another scoring metric I implemented is the normalized cross correlation (NCC) whose formula is image_1/||image_1|| . image_2/||image_2||. The greater the NCC, the more similar the images are. Since the images are matrices when enumerated, I needed to flatten them to vectors for NCC. For SSD, I return the displacement of the images with the minimum SSD. For NCC, I return the displacement of the images with the maximum NCC (minimum -NCC). Finally, I shift the original color channel images (R and G) by their best displacements. In the images below, I arbitrarily use NCC as the scoring metric since there was not a huge difference between NCC and SSD.

III. Image Pyramid

While exhaustively searching works for small images, it is inefficient for large images because there are a lot more pixels. The concept of using an image pyramid is to scale down the image size multiple times by a constant factor and run exhaustive search on smallest image and get an approximate displacement, and use this displacement as the center of the window of displacements on the next level to see if we can get a more accurate displacement for the image with a greater size. I created 5 levels of the pyramid, scale with a factor of 2, and a smaller window of displacement (-7, 7) to speed things up.

Images

cathedral.jpg, R: (12, 3) G: (5, 2)

church.tif, R: (58, -4) G: (24, 4)

emir.tif, R: (92, -306) G: (48, 24)

harvesters.tif, R: (124, 14) G: (60, 16)

icon.tif, R: (90, 22) G: (40, 16)

lady.tif, R: (112, 12) G: (50, 8)

melons.tif, R: (178, 12) G: (82, 10)

monastery.jpg, R: (3, 2) G: (-3, 2)

onion_church.tif, R: (108, 36) G: (52, 26)

self_portrait.tif, R: (176, 36) G: (78, 28)

three_generations.tif, R: (112, 10) G: (54, 14)

tobolsk.jpg, R: (6, 3) G: (3, 3)

train.tif, R: (88, 32) G: (42, 6)

workshop.tif, R: (104, -12) G: (52, 0)

IV. Edge Detection

The image alignment techniques work pretty well with all images except for 'emir.tif'. This is because the algorithms use color intensity, but they won't work if the images have different brightness values which is what 'emir.tif' appears to have. Therefore, I needed to implement another feature, and I decided to use edge detection which measures the change in intensity. I replaced the color channel images in the exhaustive search with their respective outputs from the Canny edge detection function for 'emir.tif' which fixed the issue which you can see below.

Without Edge Detection

emir.tif, R: (92, -306) G: (48, 24)