CS 194-26: HW1

The purpose of this project is to create a colorized version of particular photographs taken by Sergey Prokudin-Gorskii. The red, green and blue channels are given as a misaligned input and the goal is to shift the channels appropriately so that they are all aligned to then stack into a colorized picture.

Method

For smaller images (usually less than 400x400), alignment can be done very naively because there are not many total operations to be done. This naive alignment method takes two images (image_one and image_two). It first selects an artbirary 15x15 patch near the center of image_one and then a 45x45 patch near the center of image_two. It then scans all 15x15 patches within image_two's 45x45 patch and identifies the patch with the largest similarity score. This score was initially measured through SSD but I eventually transitioned to using NCC as a more robust similarity metric. Once the most similar patch is identified, the coordinates of that patch are used to roll image_two to align with image_one. This process is repeated with two of the channels to align with the same final channel and then the channels are stacked.

For larger image (usually around 4000x4000), alignment must be done very efficiently or else there are too many operations using the naive method, which would result in a runtime of hours. I used the image pyramid approach which downsampled each image to a more manageable size (less than 350x350) by resizing the image by half at each recursive step. At the lowest level of the pyramid, I am able to use the naive alignment approach describe in the smaller image case to identify the downsampled coordinates that would result in optimal alignment. I then cascade these coordinates up the recursive chain and only adjust by at most one pixel at every step. Each large picture thus only takes less than 10 seconds, which is computationally efficient.

Challenges

The first major change was aligning the red and blue channel with the green channel instead of aligning the green and red channel with the blue channel. The subset of images that were selected, such as emir, had cases where the blue channel was oversaturated which negatively influenced the similarity heuristic. Another major change was shifting from minimizing SSD to maximizing NCC. When NCC was initially added to the code, I minimized it so a quick change to maximization fixed issues surrounding the similarity score. The last major change surrounded the idea of finding the best x and y components to roll my image by in higher iterations of the image pyramid. I initially looked at a 15x15 patch each iteration but the more optimal approach is to just look at the 3x3 patch around a particular coordinate point.