Computational Photography project 1

Project overview

The object of project 1 was to recreate color images from black and white starter images. Each black and white image actually contained 3 different images, from top to bottom corresponding to a Blue, Green, and Red channel. The technical problem we needed to solve was to determine the optimal offset when stacking the images on top of one another in order to get the best (clearest and best aligned) colored photo.

My approach

Naive

My naive solution was to take the 3 channels, use the blue channel as an achor, and then determine the offsets for the red and green channels that resulted in the "best" image. The way I determined "best" was by taking an L2 norm over the difference between a window in each image, and finding the smallest norm value for every permutation of x and y offsets ranging from -15 to +15 pixels. I decided on having a window wiht width and heigh equal to 40% of the entire image's width / height. This was specifically to avoid the edges, which have corruption as well as a lot of solid black / white.

Image Pyramids

The naive solution worked well for the smaller jpeg files, but for the much larger .tif file this method broke down, as the optimal aligment was further than 15 pixels in any direction, and searching with a futher range of offset values was prohibitively expensive. The solution to this issue was to use an image pyramid: Given an input image that was large (on the order of ~3000 pixels in each dimension), create copies of the image progressively shrunk by 50% until we have an image with dimensions less than 500 by 500 pixels. Then, iteratively call our naive align on each image from smallest to largest, keeping track of offsets at each iteration. This allows the naive solution to work in a way similar to binary search, where we first find a coarse range in which the image will be aligned, and then do increasingly finer grain alignments on the larger sized image copies.

Image outputs

Below are the result images from calling my script on each of the input images, as well as 3 additional images (castle, church, and home). Each image is labeled with the x, y offsets used when stacking the red and blue channels.

castle (Additional)
Red offset: (0, 9)
Blue offset: (0, -3)


cathedral
Red offset: (1, 7)
Blue offset: (-2, -5)


church (Additional)
Red offset: (1, 6)
Blue offset: (-3, -5)


emir
Red offset: (18, 57)
Blue offset: (-23, -48)


harvesters
Red offset: (-2, 64)
Blue offset: (-15, -59)


house (Additional)
Red offset: (21, 60)
Blue offset: (-21, -45)


icon
Red offset: (6, 49)
Blue offset: (-17, -41)


lady
Red offset: (4, 62)
Blue offset: (-7, -55)


monastery
Red offset: (1, 6)
Blue offset: (-2, 3)


nativity
Red offset: (-1, 4)
Blue offset: (-1, -3)


self_portrait
Red offset: (8, 97)
Blue offset: (-28, -77)


settlers
Red offset: (-1, 8)
Blue offset: (0, -7)


three_generations
Red offset: (-1, 59)
Blue offset: (-13, -52)


train
Red offset: (27, 44)
Blue offset: (-4, -42)


turkmen
Red offset: (7, 60)
Blue offset: (-21, -56)


village
Red offset: (10, 73)
Blue offset: (-11, -65)