CS 194-26 Project 1: Colorizing Photos

Eric Li

Overview

The goal of this project was to colorize photos in the Prokudin-Gorskii collection. Prokudin-Gorskii had collected many photos before the invention of color photography by creating images with filters only letting in red, green, and blue light. These images were obtained by the US Library of Congress, and the scans for the color channels are availible online. By combining these images, we can recreate color photography that Prokudin-Gorskii himself was never able to see.

Approach

Naive Implementation

I first split up the image into three different color channels: the red channel, the green channel, and the blue channel. Then, using np.roll and both the NCC and SSD methods outlined in the project specifications, I found the offeset for each the red and green channels that would match best with the blue channel. For NCC, this is accomplished by maximizing the value, and for SSD, this is accomplished by minimizing the value. The metrics for SSD and NCC are sum(sum((image1-image2).^2)) and image1./||image1|| dot image2./||image2|| respectively. For both channels, I shift each image up to 15 in each direction in order to find the offset that gives the best value.

For this method to work the best, I had to crop out the edges of the images because the borders were interfering with the meaningfulness of the metric calculations. Thus, I cropped off ten percent of each side of the image. However, when forming the composite images, I was careful to always go back to using the original images. I also had to zero pad the edges of the cropped images because rolling the color values sometimes led to inaccurate results when the remaining component rolled back around. This led to the left side of the image being correlated with the right or vice versa in the case of horizontal rolling. Zero padding the edges removed this problem.

Image Pyramid

Since calculating displacements with the naive implementation would have given an incredibly long runtime for large images, optimization was necessary. I used an image pyramid. I first downsampled the image to 16x16 pixels and ran the naive offset algorithm to find the best offset at that scale. Then, I redownsampled the image to 32x32 pixels and ran the naive algorithm to search the area around two times each of the 16x16 offset coordinates. I continued this process until I reached the native resolution of the photo. Because we only need to search from [-3, 2] at each layer, which gives us 6 values for each axis, this method is significantly faster.

Aligning Emir and Tobolsk

These two photos were particularly difficult, because of large differences in pixel magnitude across different color channels. I dealt with this by implementing a Prewitt filter, which calculates the gradient of the image at each point. I then took these gradient images and and ran the image pyramid with the NCC metric on them in order to determine the best offset to use. For these two images, using the Prewitt filter gave me much better alignment than the unprocessed images. The kernel Gx can be written as [1 1 1]' * [1 0 -1] and the kernel Gy can be written as [1 0 -1]' * [1 1 1]. The final gradient approximation can be written as sqrt(Gx^2 + Gy^2). The results of the Prewitt filter on Emir and Tobolsk on the green channel are shown below.

Emir Prewitt Filtered

Tobolsk Prewitt Filtered

Bells and Whistles

Automatic Cropping

To implement automatic cropping, I went back to using the raw images. Then for each column, I average the value of the pixels in each column and flatten into a 1D row vector. I then calculate a gradient by convolving with [-1 0 1], and I take the absolute value of this gradient. To make the results more robust, I add a Gaussian blur before flatteing or processing the image, which in expermental trials seems to make edge detection more reliable. This graph is shown below. I make a similar graph for all three color channels, for both the vertical and horizontal axes. I then set a threshold of 5, meaning that I take the largest index on the left side of the image and the smallest index on the right side of the image that has a value above this threshold. Taking the largest value for the lower bound and the smallest value for the upper bound across all three channels gives me the threshold I'm looking for; cutting off these pixel numbers will remove all the color artifacts.

However, this threshold value don't work quite as well with pictures with strong internal edges like St. Nicholas. Here is a graph of the gradient of St. Nicholas. There were similar problems with Tobolsk. I ended up fixing this by manually tuning the gradient threshold for cropping up to about 15, which seemed to work pretty well. You can see the reason for this in the gradent graph. The strong peak at the 3500 pixel mark corresponds to the edge of a window, not the edge of a picture, so my algorithm was falsely detecting this as an edge. Thankfully, intelligent threshold choice got rid of all such problems.

Final Results

Cathedral

Green: (5, 2), Red: (12, 3)

Emir

Green: (40, -5), Red: (108, -32)

Harvesters

Green: (59, 16), Red: (124, 13)

Icon

Green: (41, 17), Red: (89, 23)

Lady

Green: (51, 9), Red: (112, 11)

Melons

Green: (81, 10), Red: (178, 13)

Monastery

Green: (-3, 2), Red: (3, 2)

Onion Church

Green: (51, 26), Red: (108, 36)

Self Portrait

Green: (78, 29), Red: (176, 37)

Three Generations

Green: (53, 14), Red: (112, 11)

Tobolsk

Green: (3, 3), Red: (6, 3)

Train

Green: (42, 5), Red: (87, 32)

Village

Green: (64, 12), Red: (137, 22)

Workshop

Green: (53, 0), Red: (105, -12)

Additional Pictures

Camping

Green: (40, -5), Red: (108, -32)

St. Nicholas

Green: (70, 24), Red: (154, 45)

River

Green: (69, 16), Red: (154, 11)