My project has 3 stages: cropping, aligning, trimming. I will use the example of workshop.tif as I explain.
In order to crop images before alignment, I first normalized the whole image and ran a cross correlation of a horizontal vector of -1s to get an output like this:
I then filter indices above a given value and step through them, looking for a step in index larger than the image I'm expecting. On images such as lady.tif, this tends to break as the boundaries between the different colors of images are quite light, so as a backup this detects when it fails and simply splits the images based on the maximum value of the correlation when the image is split into 4 lengthwise (as we cut at the top and bottom, as well as between the three images). The cropped images are then center padded to make them all the same size and end up looking something like this:
A basic pyramid alignment technique is used to first align the images using normalized cross-correlation at low resolution, then progressively increase in size aligning around the correct alignment found at the lowest resolution. In my algorithm, I halved the size of images at each depth. The cross-correlations at each depth end up looking like this:
We can see the size of the proper alignment zone increase as we go up in depth and image size, all the way to the final image. For the largest image sizes, I found directly computing the cross-correlation was quite slow, so I used an FFT to compute the correlation at these large sizes instead. By aligning both the green and red image to the blue image, I then had pixel offsets for each image relative to the blue image. These were then superimposed to create the following picture:
Note that these pixel offsets are not offsets from the coarse thirds crop, so they have little meaning outside of the context of this method. Hence I didn't include them.
The above image looks great, although it has some issues: specifically the big green bar on the bottom. That bar is present because the green image captures more stuff lower down than either the blue or the red. Thankfully, however, because we computed the proper sizes of each image, and we know the offsets created when we center padded them, we can compute the indices where each blue/green/red image actually begins. As we also know their heights/widths, we can also compute where each image ends. By computing the innermost indices for each image, we can then crop the image along those indices to obtain an image with a minimal amount of weird colors around the edge of the frame, and this is what we end up with:
This is not perfect, as some images have parts of their boundary missing, or had to be coarsely cropped and thus there is still some artifacting around the edges, but for most this works quite well. For the images with parts of their boundary missing, most simply crops would also require removing parts of the image that do look natural, so I decided it was worth it to keep all the natural parts of the image even if some look strange (for example, there would be a lot cropped out of self_portrait.tif if I wanted to remove the green artifact in the bottom right).
Using the FFT in order to calculate the correlations results in a large speedup at higher resolutions. That given, I investigated whether the resizing/correlating done in the pyramid method ended up increasing the speed of the algorithm. For example, in the workshop image, my algorithm took around 11s to compute the alignment with a radius of 15 and depth of 5. This changed very little if I reduced the depth and increased only slightly when doubling the radius to 30 (since it's relatively little extra work on top of the FFT). As an experiment, I ran my algorithm with a depth of 1 (just a straight convolution) and a radius of 300. This took 8s and resulted in the same alignment as depth=5. So when using an FFT to compute these correlations, it may make more sense to just correlate over a larger radius, and instead reserve the pyramid for sizes which are small enough to not warrant computing the FFT.
Arguments used with photos (in order):
default (radius=15, depth=5): castle.tif, cathedral.jpg, lady.tif, icon.tif, onion_church.tif, self_portrait.tif, tobolsk.jpg, train.tif, workshop.tif
radius=10: emir.tif: This had it's radius made smaller as it tended to misalign the red channel if allowed to veer too far away from center at lower resolutions.
hyper_coarse: harvesters.tif, melons.tif, monastery.jpg, three_generations.tif: In all these images, the boundaries between the blue/red/green frames are too light and I could not crop them effectively in some automated way. To solve this, I simply cropped the image into thirds as was done in the code sample. An easy fix for this would be to photoshop in a black bar over the light bars which clearly separate the photos, so these could be made to work with the algorithm with very little manual effort.
Church of St. John
Passageway and three minarets topped with birds' nests
Chasovni︠a︡ i krest vremeni Petra I v derevni︠e︡ Sumskoe