Back before color photographs, Sergei Mikhailovich Prokudin-Gorskii took 3 photos of the same scene, with red, green, and blue filters like so:
In this project, we take the three images, and combine them into one color image. This involved shifting the images so that the line up exactly, otherwise the photo would be blurry.
The naive implementation of the algorithm involved continuously shifting the one of the color channel images in the x and y directions until the two images line up. The two images were considered lined up when some metric was minimized. I used the Sum of Squared Distances (SSD) algorithm as my metric. This is also the same as the square of the Frobenius Norm:
Originally I had not cropped the borders when calculating the channel similarity. This was problematic for two reasons. 1. The images had black borders around them, which would mess with the norm calculation. 2. Some of the images were slightly damaged around the edges. By cutting out these, I was able to more accurately line up the images. I chose to cut out 10px on all edges after shifting.
Image Notes:
|
|
|
|
The Naive implementation works on small images that are on the order of 300x300 pixels wide. However, when you have higher quality images, you must check a much higher range of pixels, and the norm computation becomes more complex, which becomes prohibitively slow.
To fix this, we create an Image Pyramid. To do this, we first decrease the quality of the image, and check a pre-defined range of pixels. Then we double the quality of the image, and check a narrow range, centered around the optimal shift from the previous quality attempt. This is similar to a binary search, where we subtract half of the possible shifts at every loop.
With the increased size of the images, cutting out 10px on every edge did not work. I had to cut out more in order to fully crop the borders on the larger images. I decided to cut out a variable amount, so that the border cropping would work on both the small and the large images. I cut out 5% of the pixels on every edge.
|
|
|
|
|
|
|
|
|
|
|
|
|
To increase the accuracy of the optimal shift, I pre-processed the data to find the edges:
This will increase the performance because our eyes are more sensitive to edges than smooth gradients. In the next two images you can see an example image's edges:
|
|
The way this works is by convolving the image with two sobel filters, in the horizontal and vertical directions, and then combine the resulting images. These are the two filters you convolve the images with:
In these next two images, you can see the quality increase when you compare the alignment with and without the edge detection. You can see the difference especially around the beard and head, as well as the scratches on the cabinet in the background.
|
|
To make the image more natural looking, I added automatic white balancing. There are two methods of white balancing, Grey World and White World.
In grey world, we find the average brightness of all of the pixels, and scale the entire image in order to make the average brightness 0.5. The specific scaling factor is:
You can see the difference in the next two images:
|
|
|
|
However, it is possible that certain pixels have large individual channel values, but have lower average values. The purpose of the clip is to prevent overflow in the individual channels. You can see what happens when you don't clip the pixels here. The overflowed pixels become purple.
White World AWB is similar to Gray World, except, it takes the brightest pixel, and scales the entire image such that the brightest has a brightness of 1:
|
|
|
|
Comparing Gray World and White World, we can see that the Gray World works better on the village picture. However, the White World works much better on the settlers picture.