CS194-26: Project 6

Rachael Wu (cs194-26-acr)

Overview

The goal of this project was to create panoramas from multiple images. In the first part, we manually selected points and rectified images using homographies and image warping. We then blended these rectified images using Laplacian blending. In the second part, we implemented automatic alignment to remove the need for manually selecting correspondences. Afterwards, we again warped and blended images into a panorama using the same technique from the first part.

Part A: Image Warping and Mosaicing

Shooting Images

The first step of this part was to shoot images to stitch together. When shooting pictures, it's important to rotate the camera, instead of translating it by moving to the left or right. This is since we need a common projection plane when combining all the images.

Below are pairs of images we will use in the first part of the project. All images were shot with an iPhone 6S.






Recovering Homographies

The second step was to recover a homography in order to warp both images to the same projection plane. This required us to manually select correspondences in each pair of images. We define p as the set containing points (x1, y1), (x2, y2), ... (xn, yn) for the first image and p' as the set containing points (x1', y1'), (x2', y2'), ... (xn', yn') for the second image, where point (xi, yi) corresponds to the point (xi', yi'). Below is an example of our selected points for a pair of images:



We want to recover a 3 x 3 transformation matrix H, where p' = Hp. More precisely, for every pair of points, we have the following projection matrix:

for which we set i = 9 to get the following set of equations:

Since we have more than 4 pairs of points, our system of equations is overdetermined. Thus, instead of finding an exact solution, we use least squares to get an approximate solution in order to solve the homography.

Warping the Image

The next step is to use our recovered homography to warp both images to the same plane. Similar to project 4, we achieved this by using an inverse warp to find the origin of each pixel in the warped image and interpolation to approximate the color. In this project, we chose to warp the left image to the projection plane of the right image. Below are our warped images for each pair of images:

Image Rectification

We can also use homographies to perform image rectification for a single image. For example, for images with planar surfaces, we can compute a homography to rotate the camera directly downwards by defining the second set of correspondences to be a square. In this case, we warped the charger and top-left frame in the following images:


to become a 50 x 50 square to get:


as our results, for which we can see that we are now viewing a ground plane.

Blending Images into Mosaics

The final step in this part of the project was to combine both images into a single mosaic. In order to do so, we first found the overlapping region between the warped left image and right image. Within that region, we used Gaussian and Laplacian stacks to blend the colors, as we did in project 3. Pixels that did not overlap between the two images were directly copied to the new mosaic. Below are our final results for the three pairs of images:






Part B: Feature Matching and Autostitching

Detecting Corner Features

The first step of the second part was to detect corners in the image. In other words, we wanted to find edge junctions where there was a large change in pixel values. In this project, we used the starter code for the Harris Interest Point Detector mentioned in “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al. Below are the detected corners for the images of Sproul Hall for a min_distance of 10:


There are 671 and 693 points for the left and right images, respectively.

Adaptive Non-Maximal Suppression

Next, we decrease the number of corner points from each image. Since we want our points to be spread equally around the image, we cannot just pick the points with the largest corner strengths. Instead, we use a method known as adaptive non-maximal suppression (ANMS), in which we:
  1. For every point i, find all points j that are sufficiently larger than point i. In this case, we used 0.9j > i as our condition for robustness.
  2. Calculate the distance between point i and all the sufficiently larger points in j and find the minimum distance.
  3. Sort the points by minimum distance and return the 500 points with the largest minimum distances.
Below are the 500 points we selected for both images:


We can see that there are less points in these images than the previous pair of images, and that they are relatively evenly spaced.

Feature Extraction

Next, we extracted a feature descriptor for every corner point. For every corner point, we copied the 41 x 41 patch surrounding the point and used scipy's builtin function to resize it to be an 8 x 8 square. This square was then flattened to a vector with 64 entries and normalized to be zero-mean and unit variance, forming our feature descriptor.

Below are some examples of our normalized 8 x 8 features:


Feature Matching

After extracting feature descriptors, the next step is to match features between the two images. First, we computed the SSD between each pair of features in the first and second image. Next, for every feature, we found the ratio between the SSD of the closest neighbor and the SSD of the second closest neighbor. If the ratio for that feature was lower than a certain threshold t, we were confident enough in the closest neighbor match, and we would include it as a feature match.

For this image, we used t = 0.4 and obtained the following feature matches:


We can see that the number of points drastically decreased and that the points in the first image match to to a point in the second image.

RANSAC

After obtaining feature matches, we used those points to compute a homography. However, since all of the feature matches were not guaranteed to be good matches, we needed a method to filter out some of those matches. In this project, we used RANSAC as a robust method to compute the homography:
  1. From randomly pick 4 pairs of points from the list of feature matches.
  2. Use the same technique from part one of the project to compute a homography H.
  3. Compute the set of morphed points from the homography using p' = Hp.
  4. For each point in p', calculate the error/SSD to the corresponding point in the feature match.
  5. If the error is less than a certain threshold e, add it to a list of inliers.
  6. Repeat steps 1 through 5 a large number of times, keeping track of the largest set of inliers.
  7. Return a final homography from the list of inliers.
For the image of Sproul Hall, we used a threshold of e = 1.0 and went through 1000 iterations.

Final Results

Finally, we used the homography computed in the previous step and the same blending process as the previous part to create mosaics:

Images of a street

Resulting Mosaic for t = 0.4 and e = 2.0

Images of the inside of a library

Resulting Mosaic for t = 0.5 and e = 2.0

We can see that both mosaics are well-aligned, indicating that our automatically selected features matched up. We also used the same images from Part 1 to compare the mosaics generated by manually selected points and automatically generated alignments:

Left: Mosaic of Sproul from Part 1; Right: Mosaic from Part 2

Left: Mosaic of Campanile from Part 1; Right: Mosaic from Part 2

Left: Mosaic of Walgreens from Part 1; Right: Mosaic from Part 2

Overall, there are little differences between all the pairs of mosaics.

Conclusion

The coolest thing I learned from this project was how we could estimate parameters for the homography through random guessing and a non-deterministic method. It was also useful to learn how to vectorize my code to make it more efficient and extract blurred features from images.