Part 1

In this part, we will compute and use homographies between images to project them into a single panorma.

Recovering Homography

To compute the homography, we first select correspondant points between the two images. Then we use these coordinates to find a projective transformation between the two by solving using least squares. The method used is described in this paper: Homography Estimation. I used 7-8 correspondants to pass into the function for every pair of images.

Selecting correspondants:

When computing the homography, each pair of points i, is represented by these rows:

Warp the Images

Using the homography from the previous part we first find the corner points of the image we want to warp. Then we can use inverse warping to interpolate the pixels to fill into the resulting image. The positions where there are no pixels from the original image are kept white.

Left and right images after warping:

Image Rectification

To check that our homography and warping is working as we want it to, we can take an image of something that has a square shape then try to warp the points to a set of square coordinate we define. For example, in the image of the door below I chose the 4 correspondants which forms a square in the warped image.

Blend the images into a mosaic

Now we can stitch the images together by aligning them correctly into one final image.

Part 2

In this part, we will be automatically choosing accurate matching feature points, so we can avoid manually selecting them.

Harris Corner Detection

Using the Harris corner detector we can find points that are corner-like. Then to be able to select a certain number of evenly distributed set of points we implement Adaptive Non-Maximal Suppression (ANMS). ANMS works by choosing the number of points with the largest radius. Where the radius of a point is defined as follows (with c_robust = 0.9):

Harris corners with and without ANMS:

Feature Descriptor

To determine which points correspond with each other, we need some way to compare the points. We do this by creating a descriptor for each point by subsampling the 40 by 40 window around the point into a 5 by 5 image, which is then normalized.

Examples of feature detectors:

Feature Matching

Now that we have a descriptor for all the points, we want to match each point to its closest match. We do this by computing the pairwise distance between the descriptors for all the points. Then selecting the closest pairwise match if the ratio between the closest and second closest is above a threshold. I set the threshold to 0.4 and it seems to do a good job at finding mostly correct matches.

Images with matched set of features

RANSAC

To result in a correct homography, we need to make sure that we remove any outliers and RANSAC allows us to do so by selecting 4 points at random at a time and computing an exact homography. The exact homography is then used to see how closely it can estimate the other remaining points within some epsilon (I used 10). The homography resulting in the most number of inliers after some number of tries (5000 worked well for me) is the one that's used to warp the image.

Some final results:

What I learned

I thought it was cool how even just using the ratio threshold the matches are already mostly correct. I was impressed how accurate it does, and not taking too long to run while still being pretty consistent in its results. I also discovered the importance of normalizing the coordinates by w to end up with the correct warping after spending a while trying to figure out what was wrong. I was also surprised by how well the images blended on its own without really needing to blend it.