Images of the Russian Empire:
Colorizing the Prokudin-Gorskii photo collection

Xingyu Jin < xingyu.jin21@berkeley.edu >

1. Background

In the early 1900s, Sergei Mikhailovich Prokudin-Gorskii traveled across the Russian Empire and took numerous colored picture with his simple but genius idea. He record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. However, the techniques back then was unable to recover the truly colored pictures, which is a pity since his RGB glass plate negatives did capture the last years of the Russian Empire and it will be great to generate a colored version.

In this project, we will take several of Prokudin-Gorskii's digitized RGB glass plate works and attempt to automatically generate color images through image processing techniques with as few visual artifacts as possible. Spliting the stacked plate works into R, G, B plates is relatively straightforward: evenly spilit the image vertically, which we will omit in this report. Importantly, there are mainly three parts in my project: aligning different color channel images to proper position with <x, y> displacement, cropping images to focus on the main contents, and adjusting colors through different techniques.

2. Aligning Channles

2.0 Direct Stacking without Aligning

Before we jump into aligning algorithms, let's first convince ourselves why it is necessary to do aligning. Randomly choose three plate file provided and directly stacking them without aligning, we get the images below:

As we can see, this is extremely unsatisfying and painful to appreciate. The main problem here is the color channel of red, blue, and green are placed in improper positions, which resulted in the weird colors and shadows. Therefore, an align process is definitely necessary before stacking channels.

2.1 Naive Aligning (Brute Force Iteration)

Among the provided digitalized plate images, there are low-resolution JPG files and high-resolution TIF files. We will first examining how to align channels for the smaller pictures. Since the number of pixels along each axis is not exceedingly large, we could try out all possible displacements through iteration and find out which displacement is best through some pre-defined criteria. I utilized a window of (-20, 20) on both x and y axis, and my criteria is to maximize (NCC - SSD) since NCC represent the correlation between the image being aligned and the target, and SDD represents how different they are from each other. As a result, I had the following images:

Base Reference: Blue
Green Shift: <-11, 2> Red Shift: <-16, 3>

Base Reference: Blue
Green Shift: <-11, 3> Red Shift: <-20, 3>

Base Reference: Blue
Green Shift: <-17, 2> Red Shift: <-20, 3>

As we can see, the resulting images are much better than the images shown in section 2.0. However, there are indeed room for improvement. For example, the boundaries of the resulting images are abnormal, and pictures themselves can still be improved. We will discuss other improvement techniques later in section 3.

2.2 Image Pyramid Aligning

The naive approach introduced above can indeed solve many of the problems, but it will be very time-consuming when an image has great amount of pixels (TIFF files). Therefore, a more efficient algorithm is necessary. Here, I utilized the method of image pyramid to represent the image at multiple scales and update the estimate of best displacement accordingly. In this process, I chose to scale by a factor of 2 and stop when one of the axis has less than 300 pixels. I also increase the size of search window as the pyramid going up (the smaller a image is, the bigger its search window is). Additionally, since TIFF files have more pixels, I made the following two improvements:

Crop the image by ratio before aligning in order to minimize the influence of unessential contents.
Using R, B, G plates as base reference, and choose the one resulting in best score (NCC - SDD). The log is shown below (scroll down to view all).

Through the techniques above, I succeeded in aligning the TIFF images and got the resulting pictures as follows. The displacement is shown as < x_shift, y_shift >

Base Reference: Green
R Shift: <64, 0> B Shift: <-35, -2>

Base Reference: Green
R Shift: <56, 16> B Shift: <-50, -23>

Base Reference: Green
R Shift: <48, 5> B Shift: <-40, -17>

Base Reference: Blue
R Shift: <114, 10> G Shift: <56, -6>

Base Reference: Green
R Shift: <96, 3> B Shift: <-82, -4>

Base Reference: Green
R Shift: <57, 10> B Shift: <-52, -22>

Base Reference: Green
R Shift: <97, 7> B Shift: <-113, 2>

Base Reference: Red
G Shift: <-59, 0> B Shift: <-112, -7>

Base Reference: Green
R Shift: <43, 26> B Shift: <-43, 0>

Base Reference: Green
R Shift: <52, -10> B Shift: <-53, 1>

Base Reference: Red
Green Shift: <-65, 3> B Shift: <-126, -13>

In the following sections we will try to do multiple improvements to make our image better.

3. Bells & Whistles

3.1 Automatic cropping

When looking at the raw pictures, it is not hard to find that the original plate picture has a black & white boundary, which did affect our aligning and picture generation a lot. Before, the approach I took is to crop the image by a fixed ratio. This seems to be better than nothing, but a automatic cropping function is more than handy. For this part, I proposed two algorithms:

AutoCrop: Based on each row and column, if the mean of the current row is less than the BLACK_THRESH or bigger than the WHITE_THRESH, get rid of that row. The process will stop as the algorithm find a row/column for the first time. However, as shown below in the pictures, the problem of this approach is that the algorithm tends to stop too early and leave certain amount of boundaries.
MaskCrop: Along axis 0, the algorithm only keep those columns whole median pixel is between [BLACK_THRESH, WHITE_THRESH]. I choose to use median for very reasonable reason. Among all common statistics, min/max will not work since there is always outliers for an array of pixels. Average value (mean) will not work because if the current array is in the boundary of the white boundary and the black boundary, the mean value will turns out to be normal (black pixels and white pixels cancelled out). Therefore, the median value is relatively more representative. All columns satisfying such range is the range I crop. Importantly, in the algorithm I also added a function to reject outliers to prevent the algorithm to stop too early. They way I decide whether to reject outlier is to see if there is a numeric jump between two adjacent column indexes satisfying the constrain above. For example, if the column indexes are [10, 11, 12, 100, 101, ...., 200, 201, 1000, 1130], my algorithm will reject outliers and only keep column 100 to 201. And for axis 1, I choose to crop through ration since the boundary along axis 1 is proportionally fix and it is actually hard for a automatic algorithm to distinguish "boundary" white versus "sky" white.

The following images show a comparison on how my Bells & Whistles changed the image quality. I only chose a proportion of all output, complete output is uploaded to bCourse:

Original Picture

Auto-Cropped Picture

Mask-Cropped Picture