CS194-26: Image Manipulation and Computational Photography
Spring 2020
Project 1: Images of the Russian Empire
Kamyar Salahi
In this project, we utilize image processing techniques to rapidly generate color images using glass plate photographs from the Library of Congress’ Prokudin-Gorskii collection. This is achieved by
aligning the three color channel photographs and subsequently combining them into a singular RGB image.
Overview
Channel Alignment and Image Pyramids
The naive approach is to simply split the glass plates into three sections. Through an exhaustive search across a shift window of between -15 and +15 in both the vertical and horizontal axes
concurrently, the color channels can be aligned to one another. The optimal shift value can be found through either the sum of squared differences or normalized cross correlation. SSD or NCC
are performed on two of the channels with one base channel of comparison to which they are both aligned. In this implementation, the red and green channels are aligned to the blue channel.
For the SSD implementation, the shift that achieves the lowest SSD is the optimally aligned. This is the case because in a perfectly aligned image the amount of pixel value variation pixel
across channels at a given point is minimized. This is of course assuming that there is strong correlation between the pixel values across channels. For the NCC implementation, the shift that
achieves the greatest NCC is optimally aligned. Since normalized cross correlation can be thought of as effectively finding the similarity between pixel values at a given point, the shift that has the
greatest pixel similarity is best aligned.
This implementation utilized NCC on the inner 1/9th of pixels. However, for higher resolution images, NCC or SSD across a window of -15 and +15 (or likely far larger due to an increased image size)
would be prohibitively slow. Since this algorithm has a time complexity of O(n*m), the time required to process and align an image increases quadratically with the dimensions of the photograph.
However, given that most images have larger features that are clearly visible at lower resolutions, alignment can be performed through a more intelligent approach. Here, we leverage the fact that
major features generally remain in an image at coarser resolutions to drastically reduce runtime. This approach is called an image pyramid.
In this implementation, we employ an image pyramid by recursively scaling down the channels by a factor of 2. Once the channels reach a base case of 64 pixels, the standard -15 to +15 alignment
search will be performed in order to align the channels according to features visible at coarser resolutions. At each step up, the estimated alignment generated by the base case is adjusted according
to NCC shifts between -1 and 1 of the previous estimate. In this way, we can continually update the estimate and minutely adjust for features that occur at higher resolutions. This results in a time
complexity of O(log(n*m)), which is significantly faster than the naive exhaustive search.
Small JPGs
Monastery
Time: 0.18403124809265137 seconds
Green Shift: [-10, 2]
Red Shift: [-11, 2]
Cathedral
Time: 0.14661407470703125 seconds
Green Shift: [-1, 2]
Red Shift: [0, 3]
Tobolsk
Time: 0.16361117362976074 seconds
Green Shift: [ -1, 3]
Red Shift: [-1, 3]
Large TIFs
Train
Time: 1.7170259952545166 seconds
Green Shift: [-17, 6]
Red Shift: [-31, 33]
Workshop
Time: 1.7589898109436035 seconds
Green Shift: [4, -1]
Red Shift: [7, -12]
Village
Time: 2.5576157569885254 seconds
Green Shift: [1, 12]
Red Shift: [11, 22]
Onion Church
Time: 2.0170321464538574 seconds
Green Shift: [-20, 26]
Red Shift: [-34, 37]
Three Generations
Time: 1.65513277053833 seconds
Green Shift: [-17, 14]
Red Shift: [-28, -11]
Self-Portrait
Time: 1.7474160194396973 seconds
Green Shift: [-12, 29]
Red Shift: [-5, 37]
Icon
Time: 1.7620818614959717 seconds
Green Shift: [-3, 18]
Red Shift: [2, 23]
Melons
Time: 1.820728063583374 seconds
Green Shift: [0, 9]
Red Shift: [9, 12]
Lady
Time: 2.204118013381958 seconds
Green Shift: [-17, 6]
Red Shift: [-28, 10]
Harvesters
Time: 1.770792007446289 seconds
Green Shift: [-1, 17]
Red Shift: [3, 14]
Emir
Time: 1.640261173248291 seconds
Green Shift: [-24, 24]
Red Shift: [-331, -36]
Sampled Images
Swords
Time: 2.0732460021972656 seconds
Green Shift: [-5, 17]
Red Shift: [-9, 13]
River
Time: 1.8596317768096924 seconds
Green Shift: [-20, 7]
Red Shift: [-27, 13]
Water
Time: 2.021552085876465 seconds
Green Shift: [-25, 4]
Red Shift: [-40, 5]
Statue
Time: 1.7589969635009766 seconds
Green Shift: [-39, 11]
Red Shift: [-4, 27]
Teacher
Time: 1.791341781616211 seconds
Green Shift: [-11, 39]
Red Shift: [-16, 62]
Lake
Time: 1.7403779029846191 seconds
Green Shift: [-15, -15]
Red Shift: [-19, -29]
Issues with RGB Alignment
As evident with the Emir photo, RGB alignment has its flaws. Although it is rather fast, it fails when there is significant variation in the pixel values across the red, green, and blue channels. In the case of the
emir, his cloak is a dark blue. For the color channels, this entails a bright cloak in the blue channel, a grey cloak in the green channel, and a black cloak in the red channel. Since our algorithm optimizes for
similar pixel values across the channels, this discrepancy results in a failure. The original tif is shown below to demonstrate this pixel value discrepancy.
The image above depicts how the variation in pixel values across color channels results in misalignment
Solution: Alignment Through Edges
Our previous implementation of NCC with pixel values across channels assumed that the pixel values are highly correlated across color channels. Since this assumption is not necessarily true as evident by the
example of the emir, we will utilize a different feature on which to align. We will be leveraging the fact that edges are generally consistent across the color channels of a photograph.
For edge detection, skimage’s sobel filter function was employed. This function convolves two kernels over image matrix to approximate the horizontal and vertical derivatives across an image.
We can then find the gradient magnitude using the equation above.
By performing normalized cross-correlation over this map of edges, we can align the channels based off of the similarity in edges. In this way, we can find better shift values for the emir.
Emir
Time: 4.2626519203186035 seconds
Green Shift: [-23, 24]
Red Shift: [-27, 40]
Solution: Automatic Contrasting
The colors in our previous images are a bit dull, so it may be beneficial to boost contrast a bit. This was performed through skimage’s built-in histogram equalizer. This equalizer essentially spreads out
the most frequent intensity values increasing the range of the image.
After
Before
Solution: Auto-Cropping
In addition to contrast adjustment, the borders of the image often have artifacts due to the shifting of color channels as well as black borders and various other errors along the border. This was resolved
by once again using normalized cross-correlation. However, instead of shifting the image, we will adjust the borders until the NCC is maximized.
After
Before
Bells and Whistles