Ajay Ramesh, November 21, 2018

Auto-Stitching Photo Mosaics

Part A

Overview

In Part A, we focus on warping at least two images into the same projective basis by applying a perspective transform to all but one image. We define feature correspondences by hand.

1. Recover Homographies

A homography describes a perspective transform, or perspective projection. A homography matrix has 9 variables, but since homographies are scale invariant, we can set H(3,3) = 1, thus solving for only 8 degrees of freedom. Each point correspondence provides two linearly independent equations, so we need at least four point correspondences to recover the 8 degrees of freedom. Since we will be using linear least squares to solve for the vector representing the elements of H, we want more than four correspondences to robustly estimate the homography. I'm using the "Direct Linear Transform" technique described in this paper.

2. Rectification

I'm using an inverse warp technique that does the following.
  1. Estimate H between some image, and a "rectified" set of points which are user defined
  2. Apply H to the four corners of the image, and recover the warped corners
  3. Find the points inside the polygon enclosed by the warped four corners
  4. For each of the points inside the warped polygon, apply H^-1 to find the coordinate of the original image, which you want to sample a color from
  5. Set the value of the coordinate inside the polygon to be interpolated color value corresponding to the coordinate found in (4)

For these image pairs, the image points are the four corners of the paper (or iPad), and the rectified points are the four corners of the image itself. I tried mimicking how popular apps like CamScanner and Scanner Pro work.

3. Mosaic

In order to construct the image mosaics I used a technique similar to the one described in Part 2. But instead of finding the correspondences between some image and a rectified set of points, I find the correspondences between some image and another image. For these examples, I keep the right image fixed, and warp the left image towards the right image by applying an inverse perspective warp parameterized by the estimated homography matrix. After warping, I used a nonlinear blend (the element wise max function) to "stitch" the images together, and the resulting mosaics are satisfactory. Since I took these pictures with my shaky hand, I was unable to ensure that there was no translation between each capture. Therefore, you may notice some ghosting artifacts in some images if you look really closely.

4. Summary

In Part A I learned about how an panoroma can be mathematically modeled by warping several images into the same projective basis (usually that of one of the images) by estimating homographies.

Part B

Overview

In Part B we try to achieve similar effects to Part A, but with automatic feature detection. In Part A, we had to painstakingly define feature correspondences. But now we are using Harris Corner Detection, Adaptive Non-Maximal Suppression, Feature-Space Outlier Rejection, and finally RANSAC, to define point correspondences and estimate the homography. We borrowed some key ideas from the paper Multi-Image Matching using Multi-Scale Oriented Patches.

1. Harris Corner Detection & ANMS

Fortunately we were able to use an existing Harris Corner Detector as a starting point for the rest of the project. Here's what an example image pair looks like after running the detector. The detector has its min_distance parameter set to 15px, meaning that detections will be at least 15px apart. Depending on how its parameterized, Harris Corner Detector will return hundreds if not thousands of detections. It's impractical to deal with so many points because we are limited by real world computation speed. Ideally, we would like to keep a subset of the detections that are evenly distributed across the image. We want features to be evenly distributed to ensure that feature matching is robust. We want to avoid situations where a feature on one image has more than one match in another image. You can imagine that this could happen if all the features are bunched together.
This could potentially achieved by random sampling of the Harris corners - but this is often unreliable. The MOPS paper suggests a technique called "Adaptive Non-Maximal Suppression". I have no idea what those words mean, but their proposed algorithm is very simple. Assign a radius to each point based on distance to the closest point that has a higher Harris corner strength than the current point. Then, grab the points with the top 500 or so largest radii.
The blue points represent the orginal Harris corners while red points represent the ANMS selected points.

2, 3. Feature Extraction & Matching

Now that we have some nicely distributed points in each point, we now need to match them. The MOPS paper suggests finding a 40x40 window around each point and resizing the window to a size of 8x8 while applying a low pass filter (say, Gaussian) to drive down noise and interest point location error. Normally, we would estimate the gradient of each of these patches to orient all the patches in the same direction. However, for this project, since we are dealing exclusively with panoramas, we can assume that corresponding features are roughly oriented in the same way. Thus, our feature descriptors are neither scale invariant or rotationally invariant.

In order to match the 8x8 image patches, we compute the pairwise SSD error between every bias-gain normalized image patch. With some clever vectorization, this is actually pretty fast. Each image patch will have a nearest neighbor, say 1-NN, corresponding to the lowest SSD, and a second nearest neighbor, say 2-NN, corresponding to the second lowest SSD. Feature-Space Outlier Rejection makes the assumption that the best feature matches will have a low 1-NN SSD and a high 2-NN SSD. This means that features that exhibit a very high affinity towards its best match, and a very low affinity towards its second best match, are the ones we want to keep. On the contrary, features that share equal affinity towards more than one feature may not be good for matching because there may not actually be a 1-1 correspondence. Therefore, we threshold the ratio 1-NN/2-NN by an empirically chosen value. The smaller the ratio, the better (since the error of 2-NN >> 1-NN for a good match). I chose a threshold of 0.01 for my Feature-Space Outlier Rejection.

Here are some examples of what matching patches look like. They are not exactly the same, but they are pretty close!

Here are what the point matches look like after piping Harris Corners through ANMS, and the ANMS points through Feature-Space Outlier Rejection. Notice that there are some matches that are exactly correct, while others are plain wrong.

4. RANSAC

We want to get rid of those dirty outliers from the previous step before estimating the homography. So, we turn to RANdom Sample And Consensus. Intuitively, RANSAC keeps throwing a random subset of data against our model and picks the largest set of points that are within some error tolerance to be the "inliers". Points that do not consistently fall in the set of inliers will be naturally "rejected" because their errors are too high.

For n iterations
  1. Choose 4 feature correspondences at random
  2. Use them to estimate a coarse homography H
  3. For each feature correspondence pair (pi, pi') compute the L2-norm error ||pi' - Hpi||
  4. If ||pi' - Hpi|| < e add (pi, pi') to a set of inliers local to the current iteration
  5. If there are more inliers in the current iteration than the maximum number of inliers ever seen, update max_inliers
  6. Go to (1)
Finally, use max_inliers to compute a refined homography H.

I chose an error tolerance of 0.5 pixels, over 100 iterations. This is what my matching features look like after running RANSAC. Compare this with the feature matches after Feature-Space Outlier Rejection in the previous step.

Results

Since I had more time to do Part B, I implemented linear blending / feathering for constructing my mosaics.

Street

Memorial Stadium

Without automating feature selection and feathering.

With automatic feature selection and feathering.

Fireplace

Without automating feature selection and feathering.

With automatic feature selection and feathering.

Pathway

Without automating feature selection and feathering.

With automatic feature selection and feathering.

5. Summary

I learned about feature descriptors, feature matching, and most importantly, how to implement RANSAC. I've always used various libraries and packages to accomplish these things, but I'm really glad that I have a rough sense of how they work under the hood.