CS194-26 Project 1 - William Wang

Overview

The goal of this project was to use Prokudin-Gorskii's photo collection from 1907 and colorize them. Each photo in Prokudin-Gorskii's collection is pictured 3 times - once with a red, green, and blue filter each. In order to colorize the photos, we must overlay the photos and align them so they can form an RGB color image.

Part 1: Exhaustive Search

For smaller images, we can use the basic assumption that the offset between the three color channel pictures. As a result, we can simply run an exhaustive search on every offset (at the pixel granularity) within an n x n distance. To determine which offset aligns the photos the best, we can use a variety of metrics. I personally used the sum of squared distances between each pixel in the overlaid photos to generate a score, optimizing for finding the minimum sum in the offsets of the n x n region. Once finding the offsets to align the photos by, we simply stitch the photos together and display it as if it were a typical RGB photo. Below are the results of visualizing the overlaid photos and the optimal offsets that were calculated.

R: [7 1] G: [0 0] B: [-5 -2]

R: [6 1] G: [0 0] B: [3 -2]

R: [4 1] G: [0 0] B: [-3 -3]

Part 2: Pyramid Search

Some of the photos in the collection were higher resolution, making it computationally expensive to continue performing an exhaustive search. To alleviate this issue, I employ a pyramid search of depth 3 and a scale factor of 0.25 (except for melons, which used a scale factor of 0.33 due to its larger offset). For the pyramid search, I looked first scaled down my photo multiple times. Then, starting at the smallest resolution copy, I find the optimal offset. I scale the offset by the inverse of how far I scaled down to the picture, and add this to my final result offset. I do the same, working my way up in resolution, however instead of starting my search from an arbitrary offset, I use the information from the optimal offset in the smaller picture by beginning my search at that point, and only searching pixel offsets that would not have been explored by the exhaustive search in the prior search of the smaller scaled picture. Due to poor alignment quality, I decided to use a more accurate metric: normalized cross correlation.

R: [36 -10] G: [0 0] B: [-26 -6]

R: [60 20] G: [0 0] B: [-50 -26]

R: [66 -4] G: [0 0] B: [-62 -20]

R: [50 8] G: [0 0] B: [-44 -20]

R: [64 2] G: [0 0] B: [-56 -10]

R: [99 6] G: [0 0] B: [-87 -12]

R: [60 12] G: [0 0] B: [-54 -30]

R: [102 12] G: [0 0] B: [-81 -33]

R: [64 -3] G: [0 0] B: [-54 -18]

R: [48 30] G: [0 0] B: [-45 -9]

R: [54 -15] G: [0 0] B: [-57 0]

Three additional images: Art Piece, Church of Savior, and House. For these photos, all of them were high resolution, which required me to use pyramid search for lower computational expense.

R: [75 -27] G: [0 0] B: [-63 24]

R: [15 9] G: [0 0] B: [-21 -27]

R: [27 12] G: [0 0] B: [-24 -21]