Ajay Ramesh, November 21, 2018
Auto-Stitching Photo Mosaics
Part A
Overview
In Part A, we focus on warping at least two images into the same projective basis by applying a perspective transform to all but
one image. We define feature correspondences by hand.
1. Recover Homographies
A homography describes a perspective transform, or perspective projection. A homography matrix has 9 variables,
but since homographies are scale invariant, we can set H(3,3) = 1
, thus solving for only 8 degrees
of freedom. Each point correspondence provides two linearly independent equations, so we need at least four
point correspondences to recover the 8 degrees of freedom. Since we will be using linear least squares to solve
for the vector representing the elements of H
, we want more than four correspondences to robustly
estimate the homography. I'm using the "Direct Linear Transform" technique described in
this paper.
2. Rectification
I'm using an inverse warp technique that does the following.
- Estimate
H
between some image, and a "rectified" set of points which are user defined
- Apply
H
to the four corners of the image, and recover the warped corners
- Find the points inside the polygon enclosed by the warped four corners
- For each of the points inside the warped polygon, apply
H^-1
to find the coordinate
of the original image, which you want to sample a color from
- Set the value of the coordinate inside the polygon to be interpolated color value corresponding to
the coordinate found in (4)
For these image pairs, the image points are the four corners of the paper (or iPad), and the rectified points
are the four corners of the image itself. I tried mimicking how popular apps like CamScanner and Scanner Pro work.
3. Mosaic
In order to construct the image mosaics I used a technique similar to the one described in Part 2. But instead
of finding the correspondences between some image and a rectified set of points, I find the correspondences between
some image and another image. For these examples, I keep the right image fixed, and warp the left image towards the
right image by applying an inverse perspective warp parameterized by the estimated homography matrix. After warping,
I used a nonlinear blend (the element wise max function) to "stitch" the images together, and the resulting mosaics
are satisfactory. Since I took these pictures with my shaky hand, I was unable to ensure that there was no translation
between each capture. Therefore, you may notice some ghosting artifacts in some images if you look really closely.
4. Summary
In Part A I learned about how an panoroma can be mathematically modeled by warping several images into
the same projective basis (usually that of one of the images) by estimating homographies.
Part B
Overview
In Part B we try to achieve similar effects to Part A, but with automatic feature detection. In Part A,
we had to painstakingly define feature correspondences. But now we are using Harris Corner Detection, Adaptive
Non-Maximal Suppression, Feature-Space Outlier Rejection, and finally RANSAC, to define point correspondences
and estimate the homography. We borrowed some key ideas from the paper
Multi-Image Matching using Multi-Scale Oriented Patches.
1. Harris Corner Detection & ANMS
Fortunately we were able to use an existing Harris Corner Detector as a starting point for the rest of the project.
Here's what an example image pair looks like after running the detector. The detector has its min_distance
parameter set to 15px, meaning that detections will be at least 15px apart.
Depending on how its parameterized, Harris Corner Detector will return hundreds if not thousands of detections.
It's impractical to deal with so many points because we are limited by real world computation speed. Ideally, we
would like to keep a subset of the detections that are evenly distributed across the image. We want features to be
evenly distributed to ensure that feature matching is robust. We want to avoid situations where a feature on one
image has more than one match in another image. You can imagine that this could happen if all the features are
bunched together.
This could potentially achieved by random sampling of the Harris corners - but this is often unreliable. The MOPS
paper suggests a technique called "Adaptive Non-Maximal Suppression". I have no idea what those words mean, but
their proposed algorithm is very simple. Assign a radius to each point based on distance to the closest point
that has a higher Harris corner strength than the current point. Then, grab the points with the top 500 or so
largest radii.
The blue points represent the orginal Harris corners while red points represent the ANMS selected points.
2, 3. Feature Extraction & Matching
Now that we have some nicely distributed points in each point, we now need to match them. The MOPS paper
suggests finding a 40x40 window around each point and resizing the window to a size of 8x8 while applying a low
pass filter (say, Gaussian) to drive down noise and interest point location error. Normally, we would estimate the
gradient of each of these patches to orient all the patches in the same direction. However, for this project,
since we are dealing exclusively with panoramas, we can assume that corresponding features are roughly oriented
in the same way. Thus, our feature descriptors are neither scale invariant or rotationally invariant.
In order to match the 8x8 image patches, we compute the pairwise SSD error between every bias-gain normalized image patch. With some
clever vectorization, this is actually pretty fast. Each image patch will have a nearest neighbor, say 1-NN, corresponding
to the lowest SSD, and a second nearest neighbor, say 2-NN, corresponding to the second lowest SSD. Feature-Space Outlier
Rejection makes the assumption that the best feature matches will have a low 1-NN SSD and a high 2-NN SSD. This means that
features that exhibit a very high affinity towards its best match, and a very low affinity towards its second best match,
are the ones we want to keep. On the contrary, features that share equal affinity towards more than one feature may not
be good for matching because there may not actually be a 1-1 correspondence. Therefore, we threshold the ratio 1-NN/2-NN
by an empirically chosen value. The smaller the ratio, the better (since the error of 2-NN >> 1-NN for a good match). I
chose a threshold of 0.01 for my Feature-Space Outlier Rejection.
Here are some examples of what matching patches look like. They are not exactly the same, but they are pretty close!
Here are what the point matches look like after piping Harris Corners through ANMS, and the ANMS points through
Feature-Space Outlier Rejection. Notice that there are some matches that are exactly correct, while others are plain
wrong.
4. RANSAC
We want to get rid of those dirty outliers from the previous step before estimating the homography. So, we turn
to RANdom Sample And Consensus. Intuitively, RANSAC keeps throwing a random subset of data against our model
and picks the largest set of points that are within some error tolerance to be the "inliers". Points that
do not consistently fall in the set of inliers will be naturally "rejected" because their errors are too high.
For n
iterations
- Choose 4 feature correspondences at random
- Use them to estimate a coarse homography
H
- For each feature correspondence pair
(pi, pi')
compute the L2-norm error
||pi' - Hpi||
- If
||pi' - Hpi|| < e
add (pi, pi')
to a set of inliers local
to the current iteration
- If there are more inliers in the current iteration than the maximum number of inliers
ever seen, update
max_inliers
- Go to (1)
Finally, use max_inliers
to compute a refined homography H
.
I chose an error tolerance of 0.5 pixels, over 100 iterations. This is what my matching features look like
after running RANSAC. Compare this with the feature matches after Feature-Space Outlier Rejection in the previous
step.
Results
Since I had more time to do Part B, I implemented linear blending / feathering for constructing my mosaics.
Street
Memorial Stadium
Without automating feature selection and feathering.
With automatic feature selection and feathering.
Fireplace
Without automating feature selection and feathering.
With automatic feature selection and feathering.
Pathway
Without automating feature selection and feathering.
With automatic feature selection and feathering.
5. Summary
I learned about feature descriptors, feature matching, and most importantly, how to implement RANSAC. I've always
used various libraries and packages to accomplish these things, but I'm really glad that I have a rough sense
of how they work under the hood.