CS 194-26: Fall 2020
Project 5: [Auto]Stitching Photo Mosaics
Megan Lee

Part A

Shoot the Photos

In this project, we want to stitch together photos to create a paranoramic style photo. Thus, I shot two photos of my living room from the same point of view but from different view directions, and with overlapping fields of view, such that the transforms between them will be projective.

Recover Homographies

We are able to relate two images with the same camera center by casting a ray through each pixel in the first image and draw the pixel where the ray intersects in the second image. We warp these pixels by applying a homography. Formulas are shown below. I used 8 points (we need to use a minimum of 4, but each additional point makes the homography more accurate), and solved the correspondences with SVD. I initially used least squares, but found SVD to be more effective.

Image Rectification

To verify my warp function is working, let's try rectifying a few images. I tried rectifying a few flat objects that I took from an angle into a frontal-parallel view.


My warp function works pretty well!

Blend the images into a mosaic

Finally, we get to stitch together the mosaic. After I warped my images, I create a final image space with lots of padding so that the mosaic doesn't get cropped out. Then, I added both the first image and the warped image to the blank final image. In order to stitch together the two images, I first tried directly adding them, which did not work out well because it manipulated the colors in the overlapping region. Then, I tried a mask and splicing method, which worked out alright but produced some strange edge artifacts. Finally, I tried taking the maximum of both images, which produced the best results.

The entire process is visualized below.

Original Images


Selected Points


Warped Images (second image warped)


Stitched Mosaic

What I learned: Part A

This project was really cool. I found it really interesting that we are able to warp one image into another image's plane, recovering homographies, without knowing any additional information aside from the two images at hand. I was also surprised by how accurate this is. I experience weird warping effects when I try to rectify (and projective warp) some images I've taken on my iPhone, but in this project I was able to almost perfectly rectify flat objects. Since I manually went in and selected the corner points of the object, the warp was near perfect. There's so much from historical images that we're able to recover within seconds, that took some researchers years in the past! Every project in this class makes me appreciate what technology can do for us.

Part B

Harris Corner Detection

In the first part of this project, I manually labeled the interest points for warping. In this part of the project, we want to automate this process. I detected the corner points using the Harris corners algorithm. Below are the images visualized with their harris corner points.



Adaptive Non-Maximal Suppression

As you can see, we ended up with a ton of points, many of which are clustered together. The cost of matching so many points is superlinear, so we want to restrict the total number of interest points. In addition, it's important that interest points are spatially well distributed over the image. In order to meet these requirements, we implemented a adaptive non-maximal suppression strategy that is described in this paper to select 500 fixed interest points. Below are the images visualized with the selected ANMS points.



Feature Descriptor Extraction & Matching

Now, we need to match the correspondence points across the images. To do this, we essentially took patches around the corner points of the left image, and patches around the corner points of the right image, and compared them. We sampled the patches from a larger 40x40 window, applied a Gaussian blur, and then down-sampled it to be an 8x8 patch. These patches were normalized to a mean of 0 and a standard deviation of 1. A few patches are visualized below.

To compare them, I computed the SSD betweeen the two patches, and took the patch with the lowest distance as the corresponding point. In order to accomodate for outliers, I compared the distance to the closest feature (d1) to the distance to the second closest feature (d2). If the ratio d1/d2 is below a certain threshold (I chose 0.475 in this instance), we keep the correspondence. Otherwise, we discard the correspondence. Finally, we have our matched points, visualized below.



RANSAC + Mosaic-ing

Now that we have our points ready, we need to find the homography to warp them. The RANSAC algorithm attempts to find the best homography between the two images. More specifically, it brute force searches through subsets of 4 feature pairs (the minimum number of points needed to compute a homography), computes a homography, and computes inliers for the homography. We keep the largest set of inliers, and recompute the least-squares homography estimate on all of the inliers.

The parameters I chose are as follows:

  • Number of iterations (times to run RANSAC): 3000
  • ε (distance error that defines what an inlier is): 2 pixels

  • Now that we have our homography and interest points, we can finally warp the images together. I followed the same method as I did in Part A. Results are shown below (left: Automatic Keypoints, right: Manual Keypoints)



    Honestly, I'm not too satisfied with my results. I played around with the error and outliers, but ultimately I think I chose very confusing images for my algorithm. I get clean lines for the objects in the overlap, but the abundance bushy trees with tons of corner points for the leaves seems to be throwing my algorithm off. In addition, since I was outdoors, the lighting was slightly different between the photos and caused a sharper line in my image blend.

    What I learned: Part B

    I really enjoyed this part of the project. I'm consistently impressing by what we can do with just linear algebra. After training a model for hours in the last project, I'm pretty amazed by the results of this project and how it did not require any training - just math!