In this project we used the digitized Prokudin-Gorskii glass plate images and implemented an algorithm that leveraged various image processing techniques to render a color image.

After loading the images we split the images into 3 sections so that we can obtain the picture taken with the blue, green, and red filters. After doing this we pick the blue filtered image as a baseline and attempt to align the green and red filtered images to it.

To align the images we carry out a simple exhaustive search over a range of possible displacements in both x and y directions, eg. [-15, 15]. The way we choose the best displacement is by computing a distance score for each candidate displacement, and we then pick the candidate that minimized the distance score.

I implemented two different types of scoring functions which seemed to work equally well. The first one was an L2 score, this works well because it simply computes the difference between the raw pixel values and then squares these so the value is always positive. The second score function is a negated NCC such that two similar images have a more negative score than two different images. NCC works well because we are essentially flattening the images, normalizing them and then computing the Euclidean inner product between them. Since they are unit norm vectors this value is maximized, and thus the score function is minimzied, when the two images are "parallel" to each other which can happen only if the entries are the same.

One important thing to note is that I performed a circular shift to displace the images, as a consequence I had to ignore the rolled sections when computing the score for a candidate, since the edges of the image would affect the score. As a result both scoring functions only consider a subset of interior pixels which resulted in better alignments.

The naive algorithm with an exhaustive search worked surprisingly well on the low resolution images. However the larger resolution images, which often had sizes of ~3000x3000 pixels made it impossible to efficiently carry out an exhaustive search. This is because the range grows from [-15, 15] to [200, 200], giving us 40000 possible combinations.

So how do we fix this issue? One idea as suggested in the project spec was to implement a simpler version of an image pyramid in which we simply rescale the image at each level, without blurring it. The scaling factor was chosen to be 2, therefore we recursively call the align function each time scaling the image by 0.5, and halving the window size as well.

Once we reach the base case the image is usually the same size as some of the lower resolution images, this allows us to estimate the displacement and then we make our way down the pyramid at each step using the previous estimate to improve the displacement. Ultimately we reach our original size image and we have a good estimate of the true estimate. I found that carrying out an exhaustive search in a range of [-5, 5] of this displacement often resulted in an even better alignment.

Overall I think the alignment was pretty successful an most of the pictures look good. We notice that on some example such as Emir.tif (man in a blue dress) the alignment is pretty bad. This is because if we look at the original pictures we see that there is a big difference in the pixel values between the three plates. The blue plate has much lower values than the other plates therefore althought the L2 score is minimized with the given alignment it is not the best alignment. A method to improve this is discussed in the following section.

To solve the issue explained above with emir.tif I implemented a function that returns the edges of the picture and carried out the alignment using that. The function I implemented applies the Sobel filter to the picture. This involves smoothing the image first (for this I used scipy gaussian blur) and I then proceeded to convolve the image with two different kernels one that computes the gradient in the x direction and one that convolves the gradient in the y direction. I then return the magnitude of the gradient by using a function analogous to the l2 norm excpect we have matrices (same size as input image) instead of vectors. The reason we compute the magnitude is because we don't care what direction we had (i.e we don't care if the pixels went from dark to light or vice-versa) all we're interested in is that there was a significant change in pixel values. I then multiply the magnitude by 255/max_value_magnitude to scale everything appropiately. After running the aligmment on the edges the results were amazing. To get more information on this I consulted wikipedia's entry on the Sobel Operator, I copied the kernel matrices from here.

Here I am abusing terminolgy in a sense since there isn't really automatic cropping. The cropping algorithm is not very smart as I simply crop 9% of the picture everytime. I chose 9% because it worked well for emir, but other than that it's an arbitrary value. I attempted to carry out image detection by converting the image to grayscale and then selecting the middle row and the middle column vectors and then convolving them with [1, -1] which gives a 1d edge detctor however this was a bit buggy on some pictures and ended up giving up on this idea and sticking with the naive edge cropping. Here is a sample result in which this improved

Another extra feature I implemented was contrast balancing. The idea is that we can spread out the intensities of the pixels in a better manner so that really bright things are not too bright. This is quite simple to implement. The first thing I did was I converted the RGB image to HSV color space, this way I can use the V-dimension to apply balance the contrast. The idea is that I want to perform a change of coordinates such that the new 0 is given by the minimum and the new max is given by max. Therefore 0 maps to the min value and 1 maps to the max value. I applied this transformation by defining a new matrix T in which the first column is the flattened image as a column vector and the second column is all ones, then multiplied T by [max-min, min]^T.

The idea behind automatic white balancing is that I assume that the brightest pixel in the image should be white and the darkest value should be black, therefore I rescale all the pixel values according to the following transformation: y' = y[1/(max-min)] - -min/(max -min). This can also be done with a single matrix multiplication.

The difference is subtle but it's evident when you look at the left tail of the histogram. When looking at the image we see that the blacks are more black (ex man's outfit, black in the foliage, and in the rocks) and the whites are more crips (mountain in the back). Here are the before and after:

Here is another example, the left one has automatic white balancing and contrast adjusting, the one on the right is the cropped baseline (aligned using edges feature space).