CS 194-26: Image Manipulation and Computational Photography, Fall 2017

Project 1: Images of the Russian Empire

Chris Correa, CS194-26-aab



Overview

Back before color photographs, Sergei Mikhailovich Prokudin-Gorskii took 3 photos of the same scene, with red, green, and blue filters like so:

In this project, we take the three images, and combine them into one color image. This involved shifting the images so that the line up exactly, otherwise the photo would be blurry.

Part 1: Single Scale Implementation

The naive implementation of the algorithm involved continuously shifting the one of the color channel images in the x and y directions until the two images line up. The two images were considered lined up when some metric was minimized. I used the Sum of Squared Distances (SSD) algorithm as my metric. This is also the same as the square of the Frobenius Norm:

||A||_{F}=\sqrt{\sum_{i=1}^{n}{\sum_{j=1}^{n}{|a_{ij}|^{2}}}}

Originally I had not cropped the borders when calculating the channel similarity. This was problematic for two reasons. 1. The images had black borders around them, which would mess with the norm calculation. 2. Some of the images were slightly damaged around the edges. By cutting out these, I was able to more accurately line up the images. I chose to cut out 10px on all edges after shifting.


Image Notes:

Smaller JPG Images
Cathedral (2,5) (3,12)
Monastery (2,-3) (2,3)
Settlers (0,7) (-1,14)
Nativity (1,3) (0,8)

Part 2: Image Pyramid

The Naive implementation works on small images that are on the order of 300x300 pixels wide. However, when you have higher quality images, you must check a much higher range of pixels, and the norm computation becomes more complex, which becomes prohibitively slow.

To fix this, we create an Image Pyramid. To do this, we first decrease the quality of the image, and check a pre-defined range of pixels. Then we double the quality of the image, and check a narrow range, centered around the optimal shift from the previous quality attempt. This is similar to a binary search, where we subtract half of the possible shifts at every loop.

With the increased size of the images, cutting out 10px on every edge did not work. I had to cut out more in order to fully crop the borders on the larger images. I decided to cut out a variable amount, so that the border cropping would work on both the small and the large images. I cut out 5% of the pixels on every edge.


Large TIF Images
Emir (23,49) (40,107)
Harvesters (17,60) (13,123)
Icon (17,42) (23,90)
Lady (9,56) (13,120)
Self Portrait (29,79) (37,175)
Three Generations (12,54) (9,111)
Train (8,43) (29,85)
Turkmen (22,57) (28,117)
Village (11,64) (21,137)

Other Prokudin-Gorskii Images

Shakh-i Zindeh Mosque (-3,55) (-1,124)
Stork (12,43) (22,104)
Suzdal (9,56) (4,119)
Dagestani (4,8) (5,89)

Bells and Whistles: Edge Detection

To increase the accuracy of the optimal shift, I pre-processed the data to find the edges:

This will increase the performance because our eyes are more sensitive to edges than smooth gradients. In the next two images you can see an example image's edges:

Edge Detection
Without Edge Detection
With Edge Detection

The way this works is by convolving the image with two sobel filters, in the horizontal and vertical directions, and then combine the resulting images. These are the two filters you convolve the images with:

G_{x}=\begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{bmatrix} G_{y}=\begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix}

G=\sqrt{G_{x}^{2}+G_{y}^{2}}

In these next two images, you can see the quality increase when you compare the alignment with and without the edge detection. You can see the difference especially around the beard and head, as well as the scratches on the cabinet in the background.

Edge Detection
Without Edge Detection
With Edge Detection

Bells and Whistles: Automatic White Balance (AWB)

To make the image more natural looking, I added automatic white balancing. There are two methods of white balancing, Grey World and White World.

Grey World

In grey world, we find the average brightness of all of the pixels, and scale the entire image in order to make the average brightness 0.5. The specific scaling factor is:

\frac{0.5}{avg\ brightness}

You can see the difference in the next two images:

Automatic White Balance (GW)
Without AWB
With AWB
Without AWB
With AWB

However, it is possible that certain pixels have large individual channel values, but have lower average values. The purpose of the clip is to prevent overflow in the individual channels. You can see what happens when you don't clip the pixels here. The overflowed pixels become purple.

White World

White World AWB is similar to Gray World, except, it takes the brightest pixel, and scales the entire image such that the brightest has a brightness of 1:

Automatic White Balance (WW)
Without AWB
With AWB
Without AWB
With AWB

Comparing Gray World and White World, we can see that the Gray World works better on the village picture. However, the White World works much better on the settlers picture.