Part A: Image warping and Mosaicing

We're going to be doing some mosaicing! This is when multiple images from the same point are combined into one large one using homographies. To capture images I used a digital camera with a 90mm lense. All images for a specific mosaic were taken at the same exposure to keep the color consistent. Here's a sample of some of the pictures I took of doe library

jpg

1. Computing homographies

In order to compute homographies, we need to determine points in the intersection of two images where they correspond. This transformation should warp a given part of an image on top of the corresponding part of the other. We first start by choosing correspondence points.

Little Bell/Whistle 1: We also fine-tune our point selections by taking a small 160x160 pixel section around each corresponding pair of points and correlate the two patches after normalization. This lets us determine the error in our alignment and we can select the point of maximal correlation to set the two points to.

jpg jpg

After finetuning, here are the points correspondence we get on each of the images.

jpg

Now we can make our homography!

If we want to map the point a = (x, y, 1) to another point b = (u, v, 1) We can write out Ha = b as a system of 9 equations, we get, for each point given, the 2x9 matrix

[[-x, -y, -1,  0,  0,  0,  u*x,    u*y,  u],
 [0,   0,  0, -x, -y, -1,   v*x,    v*y , v]]

If we write the 3x3 matrix H as a vector h=[h11, h12, h13, h21, h22, h23, h31, h32, h33], we can solve the system Ah=0 Where A is what we get if we stack each of the above matrices for each point. We can then find the best non-trivial solution by computing the SVD of A=USV' and taking h as the last vector of V, then normalizing it such that h33=1.

Little Bell/Whistle 2: It turns out solving the above system can be somewhat numerically unstable, and this can lead to slight errors in the alignment of pixels. I explored a variety of other potential solutions.

One method of doing so is to "condition" the input points such that they converge to a better solution. As explained here, we can compute a new transformation, T which transforms the points such that their mean is at the origin and their average distance from the origin is sqrt(2) (this is done by scaling by the sample standard deviation of the points). We then output T^-1 HT so that our transformation "conditions" the points, applies the homography, then reverses the conditioning.

Let's look at the error we encounter when solving the normal homogenous system. Below is the difference between the transformed points and the desired output:

    [[  0   0   0   0  -6 -14 -20  -1 -20  -5   5   0 -11 -11 -12  -2]
     [  0   0   0   0   5   2 -13   9  15  18  16  -4  -2  -6 -22   6]]
    MSE 97.90625

This difference is as much as 20 pixels! When we apply our conditioning method, we get a much closer result:

   [[ 14   3  -2   3   3   2  -7   3  -1  -2  -2   1  -4  -5  -6   2]
    [  7   1   1  -4   4   5  -5   3  -1   1   3  -3   2  -2 -18   4]
    [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]]
    MSE 27.8125

Wow! Our Mean Squared Error (MSE) dropped from ~98 to ~28! Sweet!

Another potential way of solving this system which doesn't rely on computing the svd is to set h33=1 to begin with and then solve for Ha=b with 8 unknowns. This yields a very similar system as listed above and instead we can solve Ah=B where A is our system and B is all the b points flattened.

This also seems to converge to a similar solution as our conditioned SVD method did! Fascinating!

    [[ 14   2  -2   2   3   2  -6   2  -1  -2  -1   1  -4  -6  -6   2]
     [  7   1   1  -4   4   5  -4   4  -1   2   4  -4   2  -2 -18   4]
     [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]]
    MSE 27.65625

I then became super curious as to how good different methods were at computing these homographies. I tested all of my code against implementations in OpenCV (RANSAC, RHO, LMEDS, and linear) to see how they performed on our system. Unsurprisingly they all converged to a very good solution somewhere around the conditioned homogenous system. These other algorithms like RANSAC are also able to detect outliers however, so for less-cleanly defined points, they may converge to higher MSE-yielding solutions which are in fact better than our simple linear least squares approximation.

jpg

We can now apply the computed homographies to our images!

jpg

By detecting where the images' bounding boxes intersect we can create a nice smooth gradient between them and use a combination of masks to blend them together.

jpg jpg

Doing this for the other pair of images we get:

jpg

With some tricky index math we can then combine these two blended images together into one sweet mosaic:

jpg

Here's how it performed on a couple other images I took:

jpg

jpg jpg jpg jpg

Rectification

Another thing we can do is transform our images such that surfaces which were some non-rectangular box in our image before now become rectangular in the eyes of our camera:

jpg

The coolest thing I learned in this first part was definitely the different methods of finding homographies. It's funny to think that certain systems can be changed to make them more easily solvable, even when there is a unique global minimum we want to reach.

Part B: Automated Homography Computation

We first use the Harris corner detector to find potential points of interest

jpg

Bell/Whistle: Here we use a gaussian derivative to find a smooth local gradient at each point

jpg jpg

jpg

Each point then has a gradient attached to it. Later we will transform our patches such that they are aligned on their gradient.

jpg

Adaptive Non-maximal Suppression

Next we perform adaptive non-maximal suppression: this thins out the points we select by finding local maxima, giving us a better, more even distribution of points.

jpg

Orientation-Invariant Patch Extraction

Now we can construct our gradient-aligned feature patches to match between images. The patch is constructed by first building a transformation corresponding to a translation of the origin to the coordinate, a rotation to align the gradient, and then a scaling to sample from a wider area.

Bell/Whistle continued: For the rotation, we have that the normalized gradient is u/|u| = [cos(t), sin(t)] where t is measured relative to the y-axis (since it is the first coordinate). To make the inverse transformation to align by the gradient, we want a rotation by -t. Given cos(t), sin(t) we know that cos(-t) = cos(t) and sin(-t) = -sin(t) so we only need to negate the second term of the normalized gradient everywhere in our formula.

We take an 8x8 grid of points, centered at the origin, and apply this transformation to get their corresponding point in the whole image. We then interpolate these point values from a blurred version of the image to make it resistant to noise. This yields the following patches.

jpg jpg

We do this for each image to get two sets of aligned patches jpg jpg

Feature Matching

We then compute the normalized cross correlation of each of the patches, and select patches who have a large ratio between their best match and their second best match. This indicates that the patch is a good feature, as it looks like no other feature in the other image.

jpg jpg

RANSAC

From these matches we can then use the RANSAC algorithm to robustly compute a good homography between our two images.

This is the number of maximal inliers v.s. the number of iterations we run for. It converges relatively quickly.

jpg

Part A/B Combined: Autostitching

We can now combine these two algorithms to automate the process of aligning images!

jpg jpg jpg jpg jpg jpg jpg jpg jpg jpg jpg

Comparing our outputs, we see our autostitching method performs very well:

Manual	Automatic