CS194-26: Project 1

Rachael Wu

Overview

The goal of this project was to colorize the Prokudin-Gorskii photo collection. Each photo in the collection consists of three black-and-white photographs taken with a red, green, and blue filter. When aligned, these three negatives produce a color image. Assuming a x-and-y translation model (ie: only shift the images up, down, left, and/or right), we implemented several methods that calculated a two-dimensional displacement vector to align the three channels.

Methods

Approach 1: Exhaustive Search

The first approach we implemented was a naive exhaustive search over all the possible displacements (30 pixels in each direction) to find the alignment with the smallest SSD (sum of squared differences) in pixel value. Since the images had black and white borders, we only computed the SSD on the internal pixels of the image. This worked well for the smaller images:

Approach 2: Image Pyramid

While exhaustive search works well for smaller images, on longer images it takes too long to compute. Thus, for large images, we implemented an image pyramid:

Use scikit-image to rescale the images down to a smaller height.
Align the resized images using the exhaustive search method outlined in approach 1 with some search window (-n, n) (we use n = 15). Calculate a displacement vector (x, y)
Resize the image again to two times the height of the previous image (eg: 200 to 400). Realign the images with a smaller and adjusted search window (eg: [2x - n/2, 2x + n/2] for the x values)
Continue to resize/realign the image until we reach the original size of the image.

This method reduced runtime to a little over 3 seconds and worked well for most of the example images. For example:

However, this method failed for the picture of Emir. This is likely because of the fact that each channel does not necessarily have the same brightness value.

Bells and Whistles

Approach 3: Sobel Edge Detection

In order to align the image of Emir correctly, we used scikit-image to apply a Sobel filter on the image in order to detect its edges. This filter approximates horizontal and vertical gradients by using 3x3 kernels or convolutions on the image. For example, for the image of Emir, we detect the following edges for each channel:

After detecting edges, we find the alignment with the smallest SSD for edge magnitudes to calculate a displacement vector. We also only considered internal edges, as we wanted to ignore the borders in the image. This worked well to align the image of Emir:

However, we can see that this method does not perform as well for other images, such as the image of the lady:

Left: Aligned image from color channel pixel values; Right: Aligned image from edge pixel values

This could be due to the fact that the edges are not as well-defined on this image as compared to the image of Emir.

Automatic Contrasting and White Balance

In addition to the alignment methods mentioned above, we also implemented contrast stretching on each color channel in order to improve the perceived quality of the image. As mentioned on the course website, the safest method is to rescale the image such that the darkest pixel has a value of 0 and the lightest pixel has a value of 1. However, since we did not observe much of a change in image quality through this method, we instead rescaled the image such that the darkest 1% of pixels became 0 and the lightest 1% of pixels became 1. We also only considered internal pixels when calculating the 1st and 99th percentile of pixel values.

We also implemented automatic white balance using a gray-world assumption, in which we assume that the average color of the image is gray. In order to do so, we calculated the mean value of each color channel's internal pixels to get some average r, g, and b, found the maximum value of those averages, and scaled up the other two channels so that the mean values of all the channels were equal. For example, if the green channel had the highest mean value, we would multiply the red channel by g/r and the blue channel by g/b. Once again, we also only considered internal pixels when calculating the mean value for each channel.

From left to right: Original example image, image with white balancing, image with contrasting, image with white balancing and contrasting

Final Results

Images before and after bells and whistles:

Extra images from the collection:

Displacements:

We set the blue channel as the base.

Image	Green	Blue
Cathedral	[5, 2]	[12, 3]
Monastery	[-3, 2]	[3, 2]
Nativity	[3, 1]	[7, 1]
Settlers	[7, 0]	[14, -1]
Emir	[48, 24]	[106, 40]
Harvesters	[60, 16]	[124, 12]
Icon	[40, 16]	[88, 22]
Lady	[48, 8]	[112, 12]
Self Portrait	[78, 28]	[176, 36]
Three Generations	[52, 12]	[112, 10]
Train	[42, 4]	[86, 32]
Turkmen	[56, 20]	[116, 28]
Village	[64, 12]	[136, 20]
Cathedral (Inside)	[32, -6]	[94, -26]
Glass	[20, 16]	[60, 16]
River	[40, 4]	[152, 8]
Tower	[56, 12]	[106, 24]