Programming Project #1 (proj1)

CS194-26: Image Manipulation and Computational Photography

Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection

by Sara Wang

Overview

For this project, we are given a set of 3-channel images. For each image, it is our goal to reconstruct the 3 pieces into one colorful photograh.

Originally, each image is a 2-dimentional array that contains only the data of its height and width, for each pixel is represented by one single number (as opposed to an array) that represents its degree on blue, red or green. Once the 3-channels are combined, the numbers that represents each pixel are also merged to become a 3-element array, which is a complete indication of the actual color. Our task is to find a way to separate each image into 3 equal parts -- blue, green and red -- and align the green and the red to the blue. I had a great time working on project#1, colorizing the Prokudin-Gorskii photo collection.

Approaches

After I have the array of the 3-channel images printed out using jupyter notebook, I have had a good understanding of mathematically what it is that I need to do, and what it means to align an image to another: I need to transform the 2-d array into a 3-d array. However, the hard part is to find the displacement vector as to how much the image should translate both horizontally and vertically to perfectly align to another. I quickly thought of the method exhaustive search: as long as the image is not too large (the jpg ones), I should be able to write a nested for loop to have it try out all possible x and y to see by how much vertically and horizontally that I should move the green and red image so that they match the blue image the best. If the image is too large (tif), I will scale it down to 1/4 of its size to do the exhaustive search. I will scale it down over and over again until the size and resolution is managable. (This is the image pyramid method, given as a hint in the project specs.)

After that, there are only 2 questions remaining -- the metrics to determine how much two images match each other, and to what extent I can stop scaling down the large tif images and perform exhaustive search on them. I quickly decided to use the Sum of Squared Differences (SSD) as the metric, because it is adequately handy. As for the threshold of image pyramid's rescaling, since the height and width of a jpg image subpart is 380-390, I decided that 500 is a good enough threshold. There were a lot of decisions to make, so I wrote plenty of helper functions so that I don’t have to go back and forth on the same issues.

With the core algorithm established, I realized that the edges in each images are interfering with my results. Also, some of the images exihibit different degree of brightness, which is affects my ssd metric in determining how much images match each other. Therefore, I implemented 2 extra helper functions: one helps me get rid of the edges in each image, the other applys a skimage filter (skimage.filters.roberts) to each image before processing. With these two measures implemented, I was able to minimize the effects of unwanted visual noises. After a couple of tries, I managed to create some amazing colorful images.