Colorizing the Prokudin-Gorskii photo collection

Colorizing the Prokudin-Gorskii photo collection

Project 1, CS 194-26, Spring 2020

by Suraj Rampure (suraj.rampure@berkeley.edu, cs194-26-adz)


The goal of this assignment is to colorize the Prokudin-Gorskii collection. Each image in this collection contains 3 distinct channels – red, green, and blue – and the goal is to align the three channels together based on pixel intensity so that the final image renders properly.

                         

The images are stacked [B, G, R], and the approach was to align the green and red plates with the blue one.


Small-scale images

A naive implementation only required looking through a window of possible dispacements, and determining the displacement that made our target channel (green or red) the most similar to the blue channel. I arrived at the range [-15, 15] that seemed to work well for all small images. np.roll was very useful in shifting matrices left/right and up/down.

In order to compute the similarity of two color channels, I implemented two different distance metrics. In both of these, I flattened each channel and treated it as a long numpy.ndarray.

The first metric, the sum of squared differences (SSD), is the most natural to consider. The second metric, normalized cross-correlation (NCC), involves the dot product of two normalized vectors (μa and μb are the means of the elements in vectors a and b, respectively). Both metrics gave very similar results, so I ended up sticking with SSD in my final renderings (for no reason other than that it was the first one that I implemented). Note: The SSD is minimized when two vectors are the most similar, but the NCC is maximized in this case. When writing code, I took the negative of the NCC, in order to keep the implementation the same as for SSD.

I also cropped away 20% of the border of each image before trying to align the channels, since the largely black borders interfered with the similarity metrics. Interestingly enough, I was struggling with Cathedral for quite a while; the other two images aligned just fine, but I had to change the cropping amount from 15% to 20% in order for the Cathedral image to behave properly. (Conveniently, the majority of the other images converged to the same result using both 15% and 20%.)

Offsets are provided below as Color[y, x], since the standard directions (x, y) and the directions in arrays (rows, columns) are reversed. Runtimes are also provided; processing was done in a Jupyter notebook, on a 2016 MacBook Pro with a 2.9 GHz i7.


                                                Image Offset Runtime
Cathedral G[5, 2], R[12, 3] 0.61 s
Monastery G[-3, 2], R[3, 2] 0.56 s
Tobolsk G[3, 2], R[6, 3] 0.46s



Large-scale images

The above brute-force implementation worked for smaller images, but for larger images (~4000 by 4000) it quickly became too resource-intensive. In order to speed things up, I implemented an image pyramid, where I scaled the image down to a size that would be reasonable and repeatedly enhanced the resolution, narrowing my range of search at each step in order to keep computation time reasonable.

After experimenting with different fixed values (4, 8, 16), I set the initial scale to be log(max(width of a, height of a)), where a larger scale value corresponds to a lower resolution image. After obtaining an update for scale s, I multiplied my estimates for best_x, best_y by 2, and updated the scale to be s / 2. The base case for this recursion was when s < 1, at which point I returned the values of best_x, best_y.

Instead of fixing the range of search to be [-15, 15], I made it dynamic, so that my implementation didn’t need to look at 225 different offsets when it was very close to converging. At scale s, the window of guesses was ±2s from the previous scale’s guess, meaning that as we went further and further down our pyramid (decreasing scale), we looked at fewer values.

Again, both SSD and NCC yielded nearly identical results. The final reported offsets are generated using SSD.

Note: I kept the cropping percentage at 20%. This worked well for all images except for Emir* and Village*, whose percentages had to be changed to 15% and 30%, respectively.

                                                Image Offset Runtime
Emir* G[48, 24], R[108, 57] 17.31 s
Harvesters G[60, 16], R[124, 13] 12.05 s
Icon G[41, 17], R[90, 22] 12.58 s
Lady G[52, 8], R[119, 11] 12.57 s
Melons G[83, 10], R[170, 11] 13.91 s
Onion Church G[51, 26], R[108, 36] 13.82 s
Self Portrait G[79, 29], R[170, 35] 13.63 s
Three Generations G[54, 14], R[112, 11] 12.45 s
Train G[43, 6], R[88, 32] 12.63 s
Village* G[65, 12], R[137, 22] 7.66 s
Workshop G[53, -1], R[105, -12] 12.52 s



Other images

I also chose 3 other images from the collection (available on the Library of Congress website) to align. All three of these images were large .tif files, so they were all processed using the large-scale image pyramid procedure outlined above. They turned out quite well.

                                                Image Offset Runtime
Dog G[62, 9], R[139, 8] 13.95 s
Entrance G[12, 17], R[35, 18] 13.41 s
Peasant Girls G[-15, 10], R[12, 17] 13.44 s