CS194-26 Proj 5: Autostitching Photo Mosaics

By Mingjun Lim


In this project, We use homogrphies and image stitiching techniques to stitch images together and create panorama pictures.

Part 1: Image warping and Mosaicing

Shooting pictures

For this project, we start with picutres of my living room, at different angles. Standing still on a spot, I used the manual mode of my phone camera, with AE/AF centered on the middle of the picture, to shoot 3 photos of my living room with approximately 50% overlap between photos.

Recover Homographies

To stitch these images together, we need to warp these images such that their keypoints align. To do so, we attempt to recover a homography that will allow us to warp our images. We used the left and middle photo for this. We want some matrix H, such that Ha = b, where a corresponds to original points, and b corresponds to the warped points (which match the points in the other picture). We defined 15 sets of matching points in both images, where points P refer to points on the original points (from the middle image) and points P' refer to corresponding points on the left image. Since we are overconstrained (having more than 4 points), we use least squares estimation to obtain H, a 3 by 3 matrix that performs a forward transform on points in the image.

Warp the Images

Using the H forward warp function/matrix, we can then use inverse warping to warp the image mid to fit with points in the left image. This is our result so far.

Image Rectification

Using our homography technique, we were also able to perform image rectification. From the following image

We use an image of a square to show the top of the box":

Blending the images

Using our homography technique, we were finally able to blend both images into a mosiac. Instead of applying feathering or an alpha channel, we simply took the maximum pixel value of each image at each coordinate

The coolest thing i've learnt so far has been image rectification! It was amazing to me that there is enough data in our images so that perspective warps would still yield great results!

Part 2: Autostitching

In this part, we used the algorithm discussed in the paper to automatically select keypoints in 2 images, compute a robust homography, and finally stitch them together.

Detecting corner points

To compute corners, we used the function provided to find harris corners in the picture. For our pictures, we modified the function to use corner_peaks instead of peaks_local_max to reduce the number of points yielded. This was just to allow us to assess the effectiveness of the method

This yields us about 2000 points, as seen below.

Adaptive Non-Maximal Suppression

In order to shortlist a better set of points, we use ANMS to get points that are spread out and have a high value. For each point A, we assign it a score based on how far it is from the nearest point B where H(A) < 0.9H(B)

In the sample below, we extract the top 120 points (though in practice when creating mosiacs, we extract 500 or more for each picture)

Feature Descriptors

Once we've shortlisted a bunch of points, we want to extract feature descriptors from them so they can be matched later. To extract features, we first perform a gaussian blur, then extract a 40x40 patch around each point. From this patch, we then sample every 5th pixel to create a representative 8x8 patch that represents each feature.

In the samples below, we see each 40x40 feature followed by it's subsampled feature

Air Vent

Air Vent Side

Object on table

Light Switch

Feature Matching

Now that we're able to extract features from each image, we can then use the algorithm to perform feature matching on 2 images. Firstly, we get a list of features from both images. We then use 2dist to calculate the difference between each feature from image 1 against each feature in image 2.

For each feature in image 1, we shortlist the top 2 matching features in image 2, NN1 and NN2. To select our features, the criteria used for selecting points is that the ratio of diff(feature, NN1)/diff(feature, NN2) is smaller than some threshold E (0.25 in our case), and that diff(feature, NN1) is smaller than the mean of all diff(feature, NN2s)

These 2 criteria work as we know that correct matches tend to have diffs a lot lower than incorrect matches, and if in some case NN1 is much better than NN2 but still incorrect, it is likely that NN1 is still around the mean of NN2 (incorrect features)

This yielded us some matching points

Our matches look pretty good here, except for those on the posters on the wall. This makes sense since the posters are a repated feature, which might fool our feature matching. To fix this, we apply RANSAC in the next section.


With a list of potential matching points in both images, we can then eliminate outliers by performing RANSAC. In each iteration, we choose 4 random pairs of matching points, compute a homography, then measure the accuracy of our homography by applying it to all our other shortlisted points.

We iterate an arbitrary number of times (10000 times in our case, since this calculation is cheap), and in each iteration, we get a list of 'inliers', points p where H(p) ~= p`. We keep the largest list of inliers found

Finally, we use this largest list of inliers as our matching points to compute a homography the way we did in part A.


Finally, we can put it all together to automatically produce mosiacs!

Here is a comparison of our manually stitched mosiac versus our automatically stitched mosiac

We also produced a mosiac that shows the right side of the same room

Lastly, we produced a mosiac of the outside of the house.

One shortfall of my algorithm is that it tended to work better on photos with high exposure or more light. This increased the constrast and allowed for better detection of corners and feature matching.

The coolest thing i've learned about the project is the concept of feature matching! It seemed like a simple thing to match points in images by brute force, but I only learned why it wouldn't work to simply use nearest distance. I think the criteria for a NN1/NN2 ratio was a great way to think about feature matching (that matches are relative) over pure accuracy.