CS194-26: Image Manipulation and Computational Photography

Spring 2020

Project 1: Images of the Russian Empire

Kamyar Salahi

In this project, we utilize image processing techniques to rapidly generate color images using glass plate photographs from the Library of Congress’ Prokudin-Gorskii collection. This is achieved by

aligning the three color channel photographs and subsequently combining them into a singular RGB image.

Overview

Channel Alignment and Image Pyramids

The naive approach is to simply split the glass plates into three sections. Through an exhaustive search across a shift window of between -15 and +15 in both the vertical and horizontal axes

concurrently, the color channels can be aligned to one another. The optimal shift value can be found through either the sum of squared differences or normalized cross correlation. SSD or NCC

are performed on two of the channels with one base channel of comparison to which they are both aligned. In this implementation, the red and green channels are aligned to the blue channel.

For the SSD implementation, the shift that achieves the lowest SSD is the optimally aligned. This is the case because in a perfectly aligned image the amount of pixel value variation pixel

across channels at a given point is minimized. This is of course assuming that there is strong correlation between the pixel values across channels. For the NCC implementation, the shift that

achieves the greatest NCC is optimally aligned. Since normalized cross correlation can be thought of as effectively finding the similarity between pixel values at a given point, the shift that has the

greatest pixel similarity is best aligned.


This implementation utilized NCC on the inner 1/9th of pixels. However, for higher resolution images, NCC or SSD across a window of -15 and +15 (or likely far larger due to an increased image size)

would be prohibitively slow. Since this algorithm has a time complexity of O(n*m), the time required to process and align an image increases quadratically with the dimensions of the photograph.

However, given that most images have larger features that are clearly visible at lower resolutions, alignment can be performed through a more intelligent approach. Here, we leverage the fact that

major features generally remain in an image at coarser resolutions to drastically reduce runtime. This approach is called an image pyramid.


In this implementation, we employ an image pyramid by recursively scaling down the channels by a factor of 2. Once the channels reach a base case of 64 pixels, the standard -15 to +15 alignment

search will be performed in order to align the channels according to features visible at coarser resolutions. At each step up, the estimated alignment generated by the base case is adjusted according

to NCC shifts between -1 and 1 of the previous estimate. In this way, we can continually update the estimate and minutely adjust for features that occur at higher resolutions. This results in a time

complexity of O(log(n*m)), which is significantly faster than the naive exhaustive search.

Small JPGs

Monastery

Time: 0.18403124809265137 seconds

Green Shift: [-10, 2]

Red Shift: [-11, 2]

Cathedral

Time: 0.14661407470703125 seconds

Green Shift: [-1, 2]

Red Shift: [0, 3]

Tobolsk

Time: 0.16361117362976074 seconds

Green Shift: [ -1, 3]

Red Shift: [-1, 3]

Large TIFs

Train

Time: 1.7170259952545166 seconds

Green Shift: [-17, 6]

Red Shift: [-31, 33]

Workshop

Time: 1.7589898109436035 seconds

Green Shift: [4, -1]

Red Shift: [7, -12]

Village

Time: 2.5576157569885254 seconds

Green Shift: [1, 12]

Red Shift: [11, 22]

Onion Church

Time: 2.0170321464538574 seconds

Green Shift: [-20, 26]

Red Shift: [-34, 37]

Three Generations

Time: 1.65513277053833 seconds

Green Shift: [-17, 14]

Red Shift: [-28,  -11]

Self-Portrait

Time: 1.7474160194396973 seconds

Green Shift: [-12, 29]

Red Shift: [-5, 37]

Icon

Time: 1.7620818614959717 seconds

Green Shift: [-3, 18]

Red Shift: [2, 23]

Melons

Time: 1.820728063583374 seconds

Green Shift: [0, 9]

Red Shift: [9, 12]

Lady

Time: 2.204118013381958 seconds

Green Shift: [-17, 6]

Red Shift: [-28, 10]

Harvesters

Time: 1.770792007446289 seconds

Green Shift: [-1, 17]

Red Shift: [3, 14]

Emir

Time: 1.640261173248291 seconds

Green Shift: [-24, 24]

Red Shift: [-331, -36]

Sampled Images

Swords

Time: 2.0732460021972656 seconds

Green Shift: [-5, 17]

Red Shift: [-9, 13]

River

Time: 1.8596317768096924 seconds

Green Shift: [-20, 7]

Red Shift: [-27, 13]

Water

Time: 2.021552085876465 seconds

Green Shift: [-25, 4]

Red Shift: [-40, 5]

Statue

Time: 1.7589969635009766 seconds

Green Shift: [-39, 11]

Red Shift: [-4, 27]

Teacher

Time: 1.791341781616211 seconds

Green Shift: [-11, 39]

Red Shift: [-16, 62]

Lake

Time: 1.7403779029846191 seconds

Green Shift: [-15, -15]

Red Shift: [-19, -29]

Issues with RGB Alignment

As evident with the Emir photo, RGB alignment has its flaws. Although it is rather fast, it fails when there is significant variation in the pixel values across the red, green, and blue channels. In the case of the

emir, his cloak is a dark blue. For the color channels, this entails a bright cloak in the blue channel, a grey cloak in the green channel, and a black cloak in the red channel. Since our algorithm optimizes for

similar pixel values across the channels, this discrepancy results in a failure. The original tif is shown below to demonstrate this pixel value discrepancy.

The image above depicts how the variation in pixel values across color channels results in misalignment

Solution: Alignment Through Edges

Our previous implementation of NCC with pixel values across channels assumed that the pixel values are highly correlated across color channels. Since this assumption is not necessarily true as evident by the

example of the emir, we will utilize a different feature on which to align. We will be leveraging the fact that edges are generally consistent across the color channels of a photograph.

For edge detection, skimage’s sobel filter function was employed. This function convolves two kernels over image matrix to approximate the horizontal and vertical derivatives across an image.

We can then find the gradient magnitude using the equation above.

By performing normalized cross-correlation over this map of edges, we can align the channels based off of the similarity in edges. In this way, we can find better shift values for the emir.

Emir

Time: 4.2626519203186035 seconds

Green Shift: [-23, 24]

Red Shift: [-27, 40]

Solution: Automatic Contrasting

The colors in our previous images are a bit dull, so it may be beneficial to boost contrast a bit. This was performed through skimage’s built-in histogram equalizer. This equalizer essentially spreads out

the most frequent intensity values increasing the range of the image.

After

Before

Solution: Auto-Cropping

In addition to contrast adjustment, the borders of the image often have artifacts due to the shifting of color channels as well as black borders and various other errors along the border. This was resolved

by once again using normalized cross-correlation. However, instead of shifting the image, we will adjust the borders until the NCC is maximized.

After

Before

Bells and Whistles