images of the russian empire

Tony Zhao, CS 194-26 Spring 2020

overview

So apparently this Russian dude named Sergei Mikhailovich Prokudin-Gorskii (1863-1944) had the bright idea that maybe photos might eventually not be only black and white. So he decided to go ahead and run around Russia taking a bunch of photos with red, green, and blue filters in the hopes that eventually someone might be able to combine them into a color image.

The Library of Congress bought the Prokudin-Gorskii collection and made it publicly available. So we now have a collection of photos in three color channels that we need to align. If we just cut the image into thirds and naively stack them on top of each other, we get a pretty bad image in the end.

I guess we're gonna have to be a little smarter and figure out an algorithm to automatically align these images...

the approach

ALRIGHTY. It seemed like the best way to attack the problem was to implement some kind of distance metric to measure the alignment/similarity between images. Then we simply wiggle the red/green image around until we find the position where it is closest to the blue image. I noticed that the borders of the images were often straight up impossible to align as they differed greatly. I thought that this might add noise to our distance metric so I decided to crop the image by 10% on each side when calculating alignment. It didn't look like it did anything LMAO but I went ahead and kept it cause I don't think it'd hurt.

Okay! Now that we've got a game plan, let's actually find some distance metrics. I went ahead and consulted my friend named Google and found a paper on template matching. Convenient! Now we have extracted the two following distance metrics for two images: sum of squared differences and normalized cross correlation.

$$SSD(p, q) = \sum_i\sum_{j} { (p_{i,j} - q_{i,j}})^2$$ $$NCC(p, q) = \frac{\sum_i\sum_{j} { p_{i,j} \times q_{i,j}}}{(\sum_i\sum_{j}p_{i,j}^2 \sum_i\sum_{j}q_{i,j}^2)^{0.5}} $$

I ended up dropping the denominator term in normalized cross correlation because it is invariable with respect to the amount of wiggle you apply to the images in our use case. I also found that there really wasn't that much of a difference in visual performance between the two metrics. Let's try NCC on the cathedral image...

It... uh... looks good...? Well at least it's better than the naive alignment. It's kinda hard to tell since the resolution of the image isn't very high. Let's try a higher resolution image. Maybe one of a train. This is where we hit our second obstacle: it'll take a really long time to process a really large image. I'm a pretty impatient dude, so in these situations I have to resort to either faster implementations or heroin. Thankfully I don't have to resort to the latter quite yet.

the pyramid scheme

Smarter people than me have happened to figure out a solution to my problem, a pyramid scheme! This one thankfully happens to not disenfranchise and entrap vulnerable people. Instead, this pyramid disenfranchises and entraps vulnerable images of the Russian Empire.

The method is relatively straightforward. We calculate the number of pyramid levels we need to get to a minimum image resolution we want. I picked 32 by 32 pixels because I am sexually attracted to powers of two and the Mudgala Purana also describes the Hindu god Ganesha as taking on 32 forms.

$$pyramid\_levels(n\_pixels) = \log_2(\sqrt{n\_pixels}) - \log_2(\sqrt{32 \times 32})$$

Now that we have this number, all we have to do is scale the image down sequentially from the lowest resolution to the original resolution by factors of two. For example, if we had three levels in our pyramid, the lowest resolution level would be \((1/2)^3 = 1/8 \) of the original resolution. We calculate the optimal x and y offset for that level, then multiply that calculated offset by two when we advance to the next pyramid level since each level is scaled by a factor of two. Cool! Let's try this on an image of a train...

Looks pretty good! It is just ever so slightly little blurry, but eh it'll pass for now. Okay, I'll do this on the rest of the images... oh no... emir.jpeg? What's this?

Oh lordy this ain't right, I did my boy Emir dirty! He straight up looks like his soul was fed up with my crappy code and decided to leave Emir's body.

Stay with me, Emir! Don't walk towards the light! I'll save you, even if it means putting in extra effort for a CS project!

being edgy

(bells and whistles)

I found out that if I just tune the hyperparameters (minimum resolution, amount of wiggle, border crop percentage, etc.) just right, I get to be lazy and fix Emir without having to write anymore code. I could've just went ahead and finished my project then and there. But that's what a dirty Stanford student would do, and Professor Daddy DeNero told me that I bleed Berkeley blue and gold.

The issue is revealed by looking at Emir's original image. We see that the three color channels differ significantly in relative brightness. Therefore it's probably not ideal to just use basic similarity metrics as an analog for alignment.The most logical idea I had was simply finding out where the outline of Emir, and then align that outline. It's time to get edgy!

By searching online for things that can turn pictures into cartoons, I found the Sobel filter algorithm. Before applying the filter, I did a little bit of Gaussian smoothing just so that we don't get too much random noise. I used a Gaussian kernel of size 5x5 because I like high-fiving people. Let's see what this does to our boy Emir. I'll throw in a train for free.

I know Emir right now looks a slightly terrifying, but it looks like it managed to detect the edges. Now our algorithm should have an easier time lining up the edges. After adding the Sobel improvement, every image became slightly but noticeably more aligned.

I present to you, in all his THICC glory... EMIR OF BUKHARA!

Okay, after some more Googling I just realized Emir is just a title, and his name is instead Said Mir Mohammed Alim Khan. Apparently our boy here was forcefully deposed by the Bolsheviks and also happened to be the last direct descendant of Genghis Khan to rule a country. Also his daughter worked for Voice of America.

My man Emir Said Mir Mohammed Alim Khan a.k.a Emir a.ka. Your Chonky Majesty has been through a lot, the worst of which likely could be my CS project. But at least now he looks great!

results

After applying the Sobel improvement to the rest of our images, they look a better than with just the NCC algorithm. Some future improvements definitely could try using the Sobel filter to intelligently trim the ugly borders. There's also slight distortions in the images, which would require some kind of transform that warps the images to fit better. And a lot of other image alignment methods that exist that I am far too lazy to implement.

There are also some things I think are just outright impossible to fix without cheating and redrawing, such this instance of an unfortunate harvester that had her face mutilated.

The final results are below. Hover to see red-to-blue offset, green-to-blue offset, and runtime. Or get a csv.

cathedral.jpg
R offset: (3, 11)
G Offset: (2, 5)
Runtime: 0.163 seconds

emir.tif
R offset: (39, 107)
G Offset: (24, 49)
Runtime: 29.825 seconds

harvesters.tif
R offset: (14, 124)
G Offset: (18, 60)
Runtime: 30.137 seconds

icon.tif
R offset: (22, 90)
G Offset: (17, 42)
Runtime: 30.286 seconds

lady.tif
R offset: (12, 117)
G Offset: (8, 54)
Runtime: 28.982 seconds

melons.tif
R offset: (13, 178)
G Offset: (10, 82)
Runtime: 30.408 seconds

monastery.jpg
R offset: (3, 2)
G Offset: (2, -4)
Runtime: 0.160 seconds

onion_church.tif
R offset: (37, 108)
G Offset: (26, 52)
Runtime: 30.206 seconds

self_portrait.tif
R offset: (36, 176)
G Offset: (29, 79)
Runtime: 31.161 seconds

three_generations.tif
R offset: (11, 111)
G Offset: (13, 52)
Runtime: 28.658 seconds

tobolsk.jpg
R offset: (4, 6)
G Offset: (2, 2)
Runtime: 0.150 seconds

train.tif
R offset: (32, 86)
G Offset: (6, 42)
Runtime: 29.149 seconds

village.tif
R offset: (22, 137)
G Offset: (12, 64)
Runtime: 31.233 econds

workshop.tif
R offset: (-12, 106)
G Offset: (0, 53)
Runtime: 29.427 seconds

Some other photos from the Prokudin-Gorskii collection.

glass.tif
R offset: (18, 59)
G Offset: (16, 20)
Runtime: 30.282 seconds

lumber.tif
R offset: (15, 88)
G Offset: (9, 19)
Runtime: 31.686 seconds

siren.tif
R offset: (-25, 96)
G Offset: (-6, 49)
Runtime: 31.309 seconds

ukraine.tif
R offset: (4, 120)
G Offset: (3, 26)
Runtime: 29.404 seconds