First, I implemented a single-scale version. After comparing with the NCC metric, I ended up using the SSD metric to find the best displacement of pairs of color channels (Green and Blue, Red and Blue), holding Blue fixed. Then I align Red channel and Green channel to the Blue channel. Then, to optimize this search method, I implemented pyramid speedup, continuously half the size of picture up to four times. I then used a recursion to find displacements of smaller images, align larger images by the displacement.
Simply doing this won't always yield the best alignment, so I implemented fixed-sized cropping before aligning and after stacking. The fixed size I chose after experimentation is 10 percent of image width/length.
Other problems I encountered are some mis-alignment when applying the alignment algorithm to some images (e.g. melons.tif), which is then fixed by changing the base image to which I align the other two images.
All displacements written in format: [x, y].