CS 194-26 Project 1 Colorizing the Prokudin-Gorskii Photo Collection
by Quinn Tran

Introduction

Prokudin-Gorskii's images are greyscale images taken from Russia during the early 1900's. Each image had three frames, where each frame was captured by a different filter (R, G, B). Color intensity was explained by high values (white) and low values (black) of the grayscale frame.


Stacking the images on top of each other (no alignment) results in a poor colorized image because each frame was taken in a different shot.



Algorithm

Naive

For smaller images. This is the base case of the Pyramid algorithm. Set blue as the fixed channel. Slide in (x, y) direction the red channel to match up with the blue channel. Slide in (x, y) direction the green channel to match up with the blue channel. An "optimal" match would be to have a roll within [-15, 15] result in the max SSD for one channel with the blue channel. However, there were strange shifts where images would roll only to match up borders or low value, bright areas of one channel and high value, dark areas of a channel. NCC was a better metric.


Image Pyramid

For larger images, since the search space is too large to look at in the same resolution. This has a base case of a small image with length and width in [0,400]. Each recursive call scaled the image down by .5, and thus its resolution. Each scaled image had a designated search space that ran the naive algorithm to get the current optimal [x shift, y shift].

Parameters: shift (number of pixels to roll) = 10, num_shifts (number of shifts per search space) = 10. For the sake of time (each image taking less than 1 minute to run), num_shifts = 4.


Challenges

It was hard to align the red channel because the x and y shifts were learned from the local NCC maximum, not the global maximum. We clearly see this in lady.tif.

There was no nice set of global parameters to align all of the images. This was because the solution was heavily dependent on the measurement function (NCC) and effects of window size were very magnified in the pyramid solution. icon.tif


Colorizing the images went pretty ok, but was especially tricky for large images.



Additionally, red and blue were far enough in the color spectrum to not determine distinct enough optimum values with NCC. A fix was to align green to blue, then red to green (with green already aligned) because red and green's values would correlate better.

Results

Example

Small (jpg)

Right: Output Cathedral
Red: (10, 0), Green: (0, 0), Blue: (0, 0)

Right: Output Monastery
Red: (10, 0), Green: (0, 0), Blue: (0, 0)

Right: Output Nativity
Red: (10, 0), Green: (10, 0), Blue: (0, 0). This failed to align well because of the high exposure.

Right: Output Settlers
Red: (20, 0), Green: (10, 0), Blue: (0, 0)

Large (tif)

Right: Output Emir
Red: (80, 10), Green: (30, -10), Blue: (0, 0)

Right: Output Harvesters
Red: (90, -10), Green: (60, 0), Blue: (0, 0). This didn't align because of the high exposure.

Right: Output Icon
Red: (90, 10), Green: (20, 20), Blue: (0, 0)

Right: Output Lady
Red: (90, -40), Green: (40, -10), Blue: (0, 0)

Right: Output Self Portrait
Shift: 15, Red: (135, 0), Green: (60, 0), Blue: (0, 0). This aligned, but not well because it seems like since there is a lot of green in the picture, the red and green channels would align to the blue channel less.

Right: Output Three Generations
Red: (90, 0), Green: (70, -10), Blue: (0, 0)

Right: Output Train
Red: (90, -20), Green: (20, 0), Blue: (0, 0)

Right: Output Turkmen
Red: (90, 10), Green: (50, 0), Blue: (0, 0)

Right: Output Village
Red: (90, 20), Green: (60, -10), Blue: (0, 0). This didn't align well because the high exposure from the sky and noticeably darker land could potentially bias NCC.

Chosen

Right: Output 00451u (tif)
Red: (90, -20), Green: (70, -10), Blue: (0, 0)

Right: Output 00998u (tif)
Red: (90, 0), Green: (60, 0), Blue: (0, 0). This didn't align well because it looks like the red channel didn't really correlate to either of the blue or green channels enough to shift to the right direcitons.

Right: Output 1520u (tif)
Red: (70, 0), Green: (20, 0), Blue: (0, 0)

Bells and Whistles

Automatic Cropping

Automatic Cropping with Canny Edge Detector

Used canny edge detector to find pixel positions that marked edges (a change between a True/False value). To smooth out the noise, a gaussian blur (sigma=1) was treated on the image before it was cropped. Since color channel alignment resulted in white borders on the output images, the cropped images prioritized cropping out the white borders first. This could be tuned with an "aggressiveness" level by looking for the next edge (moving towards the center of the image).

Aligned image with white borders.

Aligned image with without white borders (the first set of edges).

Sobel Featurization was pretty similar to using Canny Edge Detection for cropping.

Automatic Cropping with Gradient

Get two signals by calculating the mean of all three channels across the x and y axis respectively. Calculate the gradient of the signal, then smooth the gradient to get a more defined boundary/edge.

Gradient (in blue), smoothed gradients (in red) across x axis.

Take the absolute value of the gradient signal respective to each axis. The pixel with the largest magnitude is at the edge for each edge respective to an axis. We can tune how aggresively we want to crop by picking the nth largest magnitude.

aggro=0 (pick the largest magnitude). Cropping the red channel as a dummy grayscale image.

Auto Contrast

I cropped, then took the average of the aligned r, g, b channels into a gray scale image. The approach is to rescale the image intensity (or brightness) of the gray scaled image so that the darkest pixel is 0 (on the darkest color channel) and brightest pixel is 1 (on the brightest color channel).

Left: Before, Right: After