[Auto]Stitching Photo Mosaics

Francis Pan

In the first part of this project, I will be using point correspondences (at least 4 points) to recover homographies to perform perspective warping on images. This can then be used to "rectify" images as well as create images that can be used for photo mosaics or panoramas.

Part 1: Image Warping and Mosaicing

Recovering Homographies

In order to begin warping, we need to be able to recover homographies, using at least 4 corresponding points to do so (more is better). A homography is defined as such:

p is our original point and p' is the desired point

As you can see, there are 8 unknowns, meaning we need at a minimum 8 equations to solve for our homography matrix H. This is why we need two sets of 4 points at a minimum, to build those 8 equations. We will then use the n >= 4 corresponding points from both "source" and "destination" to set up the matricies below and then use least squares to solve for the h vector, which can then be reshaped to form our H matrix.

Matrix setup to solve for H using least squares

After we have the homography matrix, we are now ready to warp images.

Image Rectification

One use of homographies is to warp images such that they are "rectified". We can achieve this by solving for the homography matrix from points in an image that are meant to be square/rectangular, paired with hard coded square/rectangles (such as [[0, 0], [0, 1], [1, 1], [1, 0]]). Below are some examples of images that have been "rectified".

Original image of kitchen floor
"Rectified" image
With points for reference. Blue: orig, Red: new
Original image of laptop screen
"Rectified" image
With points for reference

Blending in Mosaics

The next step is to warp and blend two or more images together to produce mosaics, or panoramas. The process is very similar to rectification, but now we are simply warping one or more images to the same perspective (could be the perspective of a base image of our choosing). We then average the images where they overlap, and blend the edges of the overlap by taking a weighted average (using alpha feathering). Because we not need to fit both images into one, we also need to calculate the size of the bounding box that will accomodate both the warped image and the base image, and then fit them both into the bounding box prior to blending. Below are some examples.

David's Desk

Original Left
Original Right
Left image fit into bounding box
Warped Right image
Blended images in bounding box

Bedroom

Original Left
Original Right
Left image fit into bounding box
Warped Right image
Blended images in bounding box

Living Room TV (Ignore the mess :'))

Original Left
Original Right
Left image fit into bounding box
Warped Right image
Blended images in bounding box

Part 2: Feature Matching for Autostitching

Now that we know how to make panoramas/mosaics using manualy selected correspondences, we want to be able to do the point selection automatically.

Detecting corner features in an image

To begin our feature matching process, we need to first detect corners within our images. We can do this by using a harris corner detector. Below are the harris corners of the two room images.

Coners for room image 1
Coners for room image 2

Adaptive Non-Maximal Suppression (ANMS)

As you can see, the harris corner detector gives us way too many points. We can trim these points down to a desired quantity by using ANMS. Below are the same two images, now with the top 500 well-distributed points after running ANMS.

Top 500 for room image 1
Top 500 for room image 2

Extracting a Feature Descriptor for each feature point

After we have a nice amount of feature/corner points to work with, we can extract the feature descriptors for each of the point to perform feature matching. We do this by taking a 40x40 sample centered at the point, blurring it, and then resizing it to a 8x8 feature descriptor. Below are a few feature descriptors from the two images.

Patch 1 for room image 1
Patch 2 for room image 1
Patch 3 for room image 1
Patch 1 for room image 2
Patch 2 for room image 2
Patch 3 for room image 2

Matching these feature descriptors between two images

Now we use these feature descriptors to narrow down our valid points. We do this by using the "Russian Grandma" method: we comute the SSD between each pair of feature descriptors, and only pick the ones where the ratio between the best and second best match is significant. Hence Russian Grandma, if the best match and second best match aren't that different, they're probably both bad. As usual, shown below are the points selected by feature matching.

Feature matched points for room image 1
Feature matched points for room image 2

RANSAC

Even with feature matching, we still have some "incorrect" points. To eliminate these, we can use RANSAC. In RANSAC, we randomly select 4 sets of points (we need 4 for a homography), compute a homography using these 4 correspondences, and then check to see how many other points "agree" with the homography. At each iteration, if the number of "agreeing" correspondences is more then our current best, we switch to it. After just a couple thousand iterations of RANSAC, we almost always end up with a perfect set of correct correspondences. Once again, the RANSAC selected points are shown below.

Final points for room image 1
Final points for room image 2

Autostitching!

Putting this all together with Part 1, we can now auto stitch out photo mosaics! Below are the side by side comparisons of manul and auto stitching of the 3 sets of images I used from part 1.

Manual Room Pano
Auto Room Pano
Manual TV Pano
Auto TV Pano
Manual Desk Pano
Auto Desk Pano

As you can see, auto is better for the first two, but worse for the desk. I think this is probably due to the thresholding is both the harris corners and the feature matching steps. (Perhaps also needed more iterations of RANSAC)

Final Thoughts

It was very cool to see the points gradually get narrowed down at each step (with selection improving as well). Also interesting to see that threshold values and RANSAC runs change the results dramatically. At one point I had a smaller number of RANSAC iterations and results were of poor quality, but simply increaing the runs improved the results significantly. The coolest thing I learned was either the "Russian Grandma" method of matching features, or RANSAC. Both algorithms were very simple yet also very effective.