Project 1: Aligning Images

Overview

This project focuses on image alignment techniques. Given the 3 grayscale images, each representative of the intensity of a certain color in the scene and each taken from about the same position and perspective, find a good alignment of these 3 images. A good alignment results in a color image representative of the colors of the original scene when each of the 3 images is displayed as the red, green, and blue channels of a single picture.

Approach

The basic approach was to align pairs of channels. One channel was used as the reference channel (other channels were shifted relative to it). To align a target channel with a reference channel, each possible shifting within a specific vertical/horizontal range is exhaustively search. For each possible shifting, the shifted target image is matched against the reference image. The preferred metric for evaluating the quality of an alignment was normalized-cross-correlation: both the reference and target image were flattened into a vector and normalized, and their inner product was taken to be the correlation score. Details

The shifting was performed with numpy's roll function, since the images were represented as numpy arrays. This resulted in the pixels that "fell off" the other side of the image when shifting to be replaced into the side of the image that was now empty due to the shift. In other words, each row of pixels that was pushed off the direction of the shift was appended at the beginning of the array.
The function calculating normalized-cross-correlation used a caching mechanism to avoid redundant calculation.
To make different alignment quality metrics compatible with the same optimization algorithm, the negative normalized-cross-correlation value is used. All metrics are taken to be loss functions that are minimized by the algorithm.

Challenges

Image Size: higher resolution images could reach resolutions of over 3000x3000. At such sizes, the misalignment in the original images are large in terms of absolute displacement in pixels. Thus, searching exhaustively is slow. In addition, checking more possible alignments would be expensive as operations on such a high resolution image (eg. shifting/rolling and checking quality of alignment) is more computationally expensive.
Solution: Align on lower-resolutions of the image first to get a rough idea of the general amount of shift to start searching around at higher resolutions. This was done pyramid style. At lowest resolution r0, shift (x0, y0) yields the best alignment. At resolution r1, which is k times the resolution of r1, the search is performed around (x0 * k, y0 * k). If the area of search is within 5 pixel displacements in each area, then at r1, we will search around ([x0 * k - 5, x1 * k + 5], [y0 * k - 5, y0 * k + 5]).
The maximum displacement range at the lowest resolution was 15 pixels, and at higher resolutions, 5 pixels. This was done because at lower resolutions it is easier to search over a large area and get the coarse amount of shift necessary. At higher resolutions, only minimal adjustments/shifts were necessary to fine tune this. This has an advantageous effect on performance, since searching at higher resolutions is more computationally expensive.