My Approach:
Firstly, I took the image with the three panels and got each of the color channel panels by dividing into three,
vertical-wise; blue, then green, then red.
For alignment, I first align the green channel with the blue channel. Then, I align the red channel with the newly aligned green channel.
The reason for aligning in this order is because the filter order from top to bottom is BGR.
To do these alignments I made an algorithm to find the correct (x,y) translation that will yield the highest normalized cross correlation between
two channels.
Here is my algorithm to find the best (x,y) translation for alignment:
First, I crop the images that are being aligned. For smaller images I crop 100 pixels from the left, right, top, and bottom of the images.
For larger images I crop 200 pixels from the left, right, top and bottom of the images.
The reason this is done is to get rid of noise near the border of the images.
For smaller images, I use exhaustive search. This is where I look through all (x,y) translations of the first image, where x:[30,-30] and y:[30,-30].
For each of the (x,y) translations I store in a dictionary a key-value pair where the key is the normalized cross correlation between the translated first image
and the second image; the value for the key-value pair is the (x,y) tuple itself. After all of the translations(x:[30,-30] and y:[30,-30]) are run, I look at the maximum key in the dictionary
and return its corresponding value, which is the (x,y) tuple for the best alignment translation.
For larger images, I use an image pyramid, since using exhaustive search with a large x and y range of values on large images will have a long runtime.
For each layer of the image pyramid there is a scale and ranges for x and y that will be looked at for the exhaustive search in each layer.
At each layer, the next layer's scale and x and y ranges will be calculated.
I start with the finest grain layer of the image pyramid having a scale of 0.027 (I got this from 100/3700, since the large images tended to be around 3700 pixel width)
and using the ranges x:[30,-30] and y:[30,-30]. With these I find the best (x,y) translation by doing exhaustive search using the two cropped images at scale 0.027
(see description above for smaller images).
Given that the best translation at this layer is (x',y'), the next layer will have the scale 0.027*2 and ranges x:[2x' + 3, 2x' - 3] and y:[2y' + 3, 2y' - 3].
I keep doing this as I go down the layers of the pyramid until I get to the second to last layer (the last layer is scale of 1).
On the second to last layer, given that the best translation is (x',y') and the scale is scale_two, the ranges for the last exhaustive search will be
x:[(1/scale_two)x' + 3, (1/scale_two)x' - 3] and y:[(1/scale_two)y' + 3, (1/scale_two)y' - 3].
Thus, the last exhaustive search done is between the original cropped two images using the ranges x:[(1/scale_two)x' + 3, (1/scale_two)x' - 3] and y:[(1/scale_two)y' + 3, (1/scale_two)y' - 3].
After this last exhaustive search is done at the bottom of the image pyramid, the overall best (x,y) translation is the result.