By Adam Chang
This project entailed consuming photos from the Prokudin-Gorskii collection and converting them from their original form of 3 separate images of the R, G, and B channels into 1 aligned RGB image.
The process I implemented to combine the three different channels into one colored RGB image is as follows:
One of the issues with the original images from the collection are a thick border of black and white at the edges of the image. This is bad on several levels. Firstly, it doesn't look good, and results in thick bands at the edges of the outputted RGB image. Secondly, it leads to worse results during the alignment phase, especially when aligning using raw pixel intensities. Since the black border is very large and inconsistent between the three channels, it can lead to weird alignments. To mitigate this issue, I wrote a simple auto-cropper that crops in the image from the edges until the black border has been reduced or completely removed. The auto-cropper works by stepping from the edges of the images, and only stopping when the row or column of the image has fewer than a predefined threshold of pure black or pure white pixels.
Below are results of cropping on cathedral.jpg:
Prior to preprocessing the channel images with a sobel filter to extract edges, I was aligning the images using sum of squared differences on the raw intensity values for each channel. This lead to failures when parts of the image were dominated by regions that were primarily one color. For example, in the monastery image, the bottom dirt section is much more red than blue. This resulted in an alignment that pushed the red channel away from the blue channel, as the difference in pixel intensities in that dirt section dominated.
Image alignment using raw pixel intensities | Image alignment using sobel filter | Edge extracted image |
---|---|---|
As another improvement after aligning the images, I perform white balancing using gray world theory. What this entails is taking the average intensity of all the pixels in all channels, and then normalizing each channel individually so that each channel has an average pixel intensity equal to this overall average pixel intensity. This is called the gray world theory, because it is based on the hypothesis that on average, the world is gray (cone activation in short, medium, and long cones are all equal over the entire image). The results of the white balancing are displayed to the right of each image below.
Image | Aligned Image | Aligned Image White Balanced | Translation from G to B | Translation from R to B |
---|---|---|---|---|
Cathedral | 5px down, 2px right | 12px down, 3px right | ||
Church | 25px down, 3px right | 58px down, 4px left | ||
Emir | 49px down, 23px right | 107px down, 40px right | ||
Harvesters | 60px down, 18px right | 123px down, 11px right | ||
Icon | 42px down, 16px right | 90px down, 22px right | ||
Lady | 57px down, 9px right | 120px down, 13px right | ||
Melons | 80px down, 10px right | 177px down, 13px right | ||
Monastery | 3px up, 2px right | 3px down, 2px left | ||
Mosque | 3px down, 0px right | 8px down, 1px right | ||
Onion Church | 52px down, 25px right | 108px down, 35px right | ||
Self Portrait | 78px down, 29px right | 176px down, 37px right | ||
Three Generations | 56px down, 12px right | 112px down, 8px right | ||
Tobolsk | 3px down, 2px right | 7px down, 3px right | ||
Train | 41px down, 0px right | 85px down, 29px right | ||
Trees | 3px up, 1px right | 4px up, 1px right | ||
Workshop | 52px down, 1px left | 102px down, 12px left |