Project 1 -- Colorizing the Prokudin-Gorskii Photo Collection

By Myles Domingo

Overview

In this project, we recreate and colorize Produkin-Gorskii’s glass plate images using various image processing techniques. To create a colored image, we focus on dividing the glass plate image into three equal sections, each representing one of three color channels (R, G, B). We align the images by offsetting pixels to ensure that the colors generated resemble the intended original image.

Alignment

To align the images, we determine a potential window of displacement (or offset) in pixels (in this case, we use [-15, 15]). We choose a channel and its image to use as a reference point, and offset the other channel images (refer to as alignment images) to get the desired result. In the first implementation, we exhaustively search each potential displacement by circularly rotating the image matrix and score it against some metric. We calculate the L2 Norm using the sum of squared distances between the alignment image and the base image, and pick the displacement that minimizes this value. We find the optimal displacements within this window for both the x and y axis.

Cropping

Borders and other non-consequential artifacts interfere with displacement calculations. Therefore, to reduce the effect of borders in our calculation, we perform a center crop by slicing 10% off each side of the image.

Image Pyramids

This first implementation, while great for smaller images with lower resolutions, like .jpgs, don’t work well with enlarged images, or .tifs. This is because the needed displacement might be outside of our set displacement window, which is small in comparison — [-15, 15] in a 3000x3000 image. We cannot simply iterate through a larger displacement window, as calculations will increase linearly and take too long to process an image.

Therefore, in order to speed up and efficiently calculate displacement for larger images, we defer to using an image pyramid. The image pyramid represents our alignment image at various scales, by a factor of alpha. We can use smaller displacement windows in lower resolution images, so we start by finding the displacement of the lowest resolution image using our first implementation call. This displacement, d , gives us an estimate of what our window of displacement should be in the next lowest resolution image.

We perform calculations (d + 1) * scale and (d - 1) * scale to set our upper and lower bounds of our window. This keeps our window small by eliminating incorrect displacements as we scale, reducing calculation time immensely. We recursively repeat these alignment calculations with larger resolution images until we reach the original alignment image. This coarse-to-fine pyramid speedup allows us to process large .tif files in 10-20 seconds.

Challenges

Main challenge was understanding how an image pyramid worked from a high level and implementing that using Python features. Speedups were releatively straightfowrard the numpy libraries. Images like lady.tif and emir.tif still are not entirely correctly aligned. This is due to the nature of the glass plate images — the differing brightnesses in each cell of the image does not fare well under normal L2 calculations. If this were to be improved, I’d either normalize the brightness levels or perform calculations using gradients instead of raw pixels.



Image Results

The following section features images from the data/ collection. Coordinates refer to displacement of green-blue alignment (GB) and red-blue alignment (RB), respectively.


Cathedral, (0, 3) GB (1, 0) RB

Monastery, (2, -3) GB (3, 3) RB,

Emir, (19, 43) GB (36, 43) RB

Harvesters, (16, 58) GB (16, 123) RB

Icon, (18, 42) GB (23, 90) RB

Lady, (6, 53) GB (-6, 116) RB

Melons, (0, 82) GB (0, 178) RB

Onion Church, (26, 51) GB (39, 108) RB

Castle, (2, 35) GB (0, 98) RB

Self Portrait, (0, 75) GB (0, 171) RB

Three Generations, (0, 54) GB (0, 112) RB

Tobolsk, (3, 3) GB (3, 6) RB

Train, (3, 42) GB (0, 91) RB

Workshop, (1, 52) GB (-11, 104) RB



Other Images

The following section features other images from the Prokudin-Gorskii Photo Collection. Coordinates refer to displacement of green-blue alignment (GB) and red-blue alignment (RB), respectively. Siren is my favorite one :^)


Detali sobora v Milanic, (10, 57) GB (21, 129) RB

Transiberian Railway, (-11, 16) GB (0, 67) RB

Siren, (-17, 50) GB (-34, 90) RB

Pole makov, (0, 12) GB (-1, 96) RB