Project

Project Description

General Info: The purpose of this project is given three images that each contain one of the following three color channels: R, G, B can they be stacked and aligned in such a way to to produce a combined image that is fully colorized. The idea would be take one of the images and then displace each of the other images by a certain vector to reduce the total color mismatch as much as possible.

The Proccess: The way I approached this problem was by first caclulating the overal height of the image and dividing it by three to get three smaller images each in different channels in the order blue, green, and red. I implemented both the sum of squared differences algorithm and normalized cross correlation algorithm to test to see which one would perform better.

To start, I displaced the green/red image by using the roll function in the numpy module which basically took a number and an axes value and then shifted all the row or column values over accordingly and if these values reached either of the four edges would then wrap the values around. I took the blue image and subtracted the displaced green/red image from it and then squared the values and then summed them up. For this method beacuse the images were of a smaller size I tested on a windo of -15 to +15 in both the x and y directions to find the displacement vector that would yield the lowest summed squared distance.

I then implemented the normalized cross correlation which would effectively take each array of images divide it by the respective magnitudes and then take the dot product of these two new vectors. Here I was trying to find the displaced green/red image that would yield the higest normalized cross correlation value.

After comparing the result on the two methods on the smaller jpg images, normalized cross correlation worked slightly better. This worked better because it was able to handle the differences in pixel brigntess between the different color channels better.

Lastly for the larger images to make the algorithm even more efficient I implemented an image pyramid which would effectively shrink down the large images by a factor of 4 each time then try to find the displaced vector, multiply the coordinates by 4 and then use that as the next starting point for the parent image. The effective run time for this new method is roughly 30-45 seconds per image.

Some of the images definitely did line up a lot better than others, the ones that had trouble lining up were images like village, emir, monastery, and village. This could be due to the result of factors such as the borders as well as the different in brigness values across channels like the project description mentions. Most of them however aligned pretty well. I am sure for the ones that didn't align all that well could be improved by using edge detection and border detection and a better way to account for the difference such as taking on pixel value and using that as the max to represent the blackest pixel and same for the whitest pixel in light across the channels.

The two pictures I also got were that of a wood house and a statue.

Here were the results of the images: