Project 1 - Saurav Shroff

In this project, I implemented a simple image-aligning algorithm. The core functionality comes from an align() function which takes two images, and searches over a (-15, 15) window on the x and y axes and chooses the best position based on SSD. The algorithm is further optimized for large images using an image pyramid.

The algorithm takes roughly 1 minute to process images of this size shown here, but also works well and much faster with a smaller (-10, 10) window, and almost as well with a (-8, 8) window. These modifications yield a 2 and 4 times speedup respectively. Intuitively though, limiting the window limits the "maximum correction" that the algorithm is capable of making. A (-15, 15) window retains quite a bit of max correction range (given that the corrections compound in magnitude at each scaling step of the pyramid algorithm. A smaller window may work on these images, but limits the algorithms ability to process unseen images that may have a more significant disparity between constituent images. (-10, 10) is great for raw speed, (-15, 15) seems better for handling severely misaligned frames.

As a workaround, I use a dynamic window. For the lowest resolution image, I use a very large (-15, 15) - inclusive window. If the algorithm pushes up against a "boundary" of the window, meaning it determines that the optimal shift is either 15 or -15 in either the x or y direction, the respective (positive or negative) part of the search window remains large for the next search done on the next, higher resolution image. However, if at any step, the algorithm doesn't push up against a boundary, that means that it has found the best place to align the image and just needs to do fine tuning as we increase resolution. In this case, I reduce the window size to (-5, 4) inclusive, for vastly decreased runtime, noting that in most all of these cases the algorithm only regularly uses the middle (-1, 1) part of the search window.

The base algorithm worked well for all images except on the red frame of emir.tif, which is a special case that further explain below.

Small Images

cathedral:

g: (5, 2)

r: (12, 3)

tobolsk:

g: (3, 3)

r: (3, 6)

monastery:

g: (-3, 2)

r: (3, 2)

Large Images

workshop:

g: (53, 0)

r: (105, -12)

train:

g: (42, 6)

r: (87, 32)

three_generations:

g: (52, 14)

r: (112, 11)

self_portrait:

g: (78, 29)

r: (176, 37)

onion_church:

g: (51, 27)

r: (108, 36)

melons:

g: (81, 10)

r: (178, 14)

lady:

g: (52, 9)

r: (112, 12)

icon:

g: (41, 17)

r: (89, 23)

harvesters:

g: (60, 17)

r: (124, 14)

emir:

g: (49, 24)

r: (8, -168)

g: (49, 24)

r: (106, 41)

This image was a special case where the the alignment did not work well using the standard algorithm that aligns the red and green frame to the blue(first image). Clearly the SSD metric did not well capture the relationship between the red and blue frame, the difference is simply too large. Instead I first aligned the green frame to the blue frame, then aligned the red frame to the aligned green frame. It seems that the green and red frames are more similar in this image. It seems like this should be true in many cases, given that the wavelength of green light is in between that of blue and red, but, nonetheless, there will always be edge cases.

castle:

g: (34, 3)

r: (98, 4)

Test Images

g: (6, 0)

r: (13, 0)

g: (5, 2)

r: (10, 2)

g: (3, -3)

r: (6, -5)