CS 194 Fall 2020

Project 5: [Auto]Stitching Photo Mosaics


Leon Ming

Overview

In this project, we use projective transformations to piece together separate pictures into cohesive mosaics.

  1. Part A1: Visualize Pictures
  2. Part A2: Homographies, Warps and Rectification
  3. Part A3: Mosaics
  4. Part A Learnings
  5. Part B1: Harris Corner Detection & Adaptive Non-Maximal Suppression
  6. Part B2: Feature Extraction
  7. Part B3: Feature Matching
  8. Part B4: RANSAC
  9. Part B5: Auto-mosaics
  10. Part B Learnings

Part A1: Visualize Pictures

I took some pictures. Here they are, with keypoints plotted.


Part A2: Homographies, Warps and Rectification

We can formulate the problem of finding a projective transformation between two images as a least-squares problem with 8 unknowns. I used a least squares solver (np.linalg.lstsq) to solve this resulting problem. To test this out, I rectified the same image from 2 different angles, shown below.


Part A3: Mosaics

We can create mosaics using the warp method we created above. First, we determine the final size of the mosaic we want. To do this, we forward warp the corners of each input image into the output space (the space of the middle image, in our application), then determine how big we need our output mosaic to be (using the min and max of the Xs and Ys). Then, we can perform inverse warping to fill in our mosaic based on bilinear interpolation of our input images. To properly patch together our images, I used Laplacian pyramids for blending.


Part A Learnings

The first mosaic turned out better than I expected. Using my iPhone camera, the exposure locking was suboptimal, so there is still a slight difference in exposure between the left and right sides of the mosaic. I'm pretty happy with the second mosaic for the most part, despite the slight distortion in the bottom edge of the image. The third mosaic turned out slightly worse. I attribute the error to the fact that the keypoints I chose make up a relatively small proportion of the image, so small errors are amplified more. Takeaway: even a small amount of error can lead to noticeable errors. However, with an overdetermined system, we can reduce the amount of error and hopefully minimize visual artifacts/inconsistencies.


Part B1: Harris Corner Detection & Adaptive Non-Maximal Suppression

In Part A, we created mosaics by manually labelling keypoints on each of our images. In Part B, we attempt to use an automatic method of identifying those keypoints. The first step is corner detection: below are Harris corners detected on each of the 3 sets of images I captured.


To filter out the weaker corners and keep only the relatively meaningful corners while also ensuring that our remaining corners are well distributed, we use Adaptive Non-Maximal Suppression to keep only the top 500 corners for each image. Below are the results.


Part B2: Feature Extraction

In order to create correspondences between corners on pairs of images, we extract an 8x8 feature for each corner on each image. These 8x8 features are sampled from a larger 40x40 detection patch centered at each corner. They are used in the next step.


Part B3: Feature Matching

Using the feature descriptors extracted from the previous step, we perform the following matching process, which also includes outlier rejection: for each 8x8 feature in image A, we find the top 2 closest 8x8 features in image B based on squared distance. Then, letting the distance to the closest feature be d1 and the distance to the second closest feature be d2, if the ratio d1/d2 is below a certain threshold (0.675, as inspired by Brown et al.), we keep the correspondence to the closest feature. Otherwise, we discard the correspondence. Below are the resulting corners after rejecting the outliers.


Part B4: RANSAC

Using RANSAC, we attempt to find the best homographies for mapping each image in a set to the base image in that set. There are several hyperparameters for this learning process:

  1. Number of iterations: how many times to run RANSAC. I set this to 4000, and I got decent results.
  2. Point threshold: how close does a point transformed from image A to image B need to be for the correspondence to be considered an inlier? I set this to 1 pixel.
  3. Model threshold: how many inliers does a computed homography need to get to be considered a good model? I set this to 16, after some experimentation.

Part B5: Auto-Mosaics

Now that we have automatically generated our keypoints for each image, we can use the same method as in Part A to create our mosaics.


Part B Learnings

I'm mostly surprised by how effective simple corner detection can be when you apply the right metrics for choosing the keypoints. Without any sort of outlier rejection, I'm sure I would not have been able to generate satisfactory results. With a combination of adaptive non-maximal suppression and RANSAC, though, the resulting mosaics are better than the ones created from manually chosen keypoints. For example, for the Evans mosaic, the angles look more realistic in the automatic mosaic than in the manual mosaic. For the Hearst Mining building, the alignment is noticeably better.

Mosaic Name Manual Keypoints Automatic Keypoints
Doe
Evans
Mining