Stitching Photo Mosaics
By Jay Shenoy
Part A: Image Warping and Mosaicing
Step 1: Shooting the Photographs
I shot three photos of the Berkeley skyline:
Step 2: Recover Homographies
I selected 12 corresponding keypoints for each pair of contiguous photographs. Here are the keypoints I selected for the images:
Image 1 Keypoints

Image 2 Keypoints

Image 2 Keypoints

Image 3 Keypoints

To recover the homography matrix between these two sets of keypoints, I solved the following system of linear equations using leastsquares:
\(
\begin{bmatrix}
x_1 & y_1 & 1 & 0 & 0 & 0 & x_1 x_2 & y_1 x_2 \\
0 & 0 & 0 & x_1 & y_1 & 1 & x_1 y_2 & y_1 y_2 \\
& & & & \vdots \\
\end{bmatrix}
h =
\begin{bmatrix}
x_2 \\
y_2 \\
\vdots
\end{bmatrix}
\)
Here, \( (x_1, y_1) \) and \( (x_2, y_2) \) are corresponding keypoints from images 1 and 2, respectively, and there are 6 such correspondences. Leastsquares lets you solve for the vector \( h \) that minimizes the squared error. \( h \) contains 8 entries that correspond to the first 8 entries of the homography matrix (the last entry is set to one).
Step 3: Warp the Images
Homography matrix in hand, I now warped image 1 to align properly with image 2. I did this using inverse warping: for each pixel in the warped image, I determined the source pixel in the original image and used interpolation to retrieve the proper pixel intensity. Image 1 was warped to align with image 2, and image 2 was translated to fit within a larger canvas (so that both images could be overlayed).
To prove that warping was working correctly, I performed image rectification, a process that takes a portion of the image that is known to be rectangular and "rectifies" it by computing the appropriate homography matrix that will make the sides of the rectangle axisaligned so it looks like we're observing that portion topdown. Three examples of image rectification are shown below:
MacBook Pro Original

MacBook Pro Rectified

Surface Pro Original

Surface Pro Rectified

Coaster Original

Coaster Rectified

Step 4: Blending
Blending the images into a mosaic is a multistage process. First, I warped the images into a larger canvas using the estimated homographies as described in step 3. The warped images are shown below:
Image 1 Warped

Image 2 Warped

Image 3 Warped

To properly mosaic the warped images, I used a oneshot procedure that computed alpha weights for each pixel in each warped image. These alpha weights range from zero to one, indicating how much to feather each pixel value. The specific weights were computed linearly by setting the alphas to one at the centers of the unwarped images and having them decline to zero at the left and right edges. Below, you can see the alphafeathered images.
Image 1 Warped + Feathered

Image 2 Warped + Feathered

Image 3 Warped + Feathered

To combine these feathered images into a mosaic, I simply ensured that the alpha weights for each pixel summed to one (to get rid of any vignetting) except, of course, for pixels with no image data. The source images and final stitch are shown below for 3 examples.
Balcony Image 1

Balcony Image 2

Balcony Image 3

Balcony Mosaic

Living Room Image 1

Living Room Image 2

Living Room Image 3

Living Room Mosaic

Desk Image 1

Desk Image 2

Desk Image 3

Desk Mosaic

What I've Learned
The coolest concept from this part of the project was that of image rectification. It's amazing that using just a homography matrix, one can visualize what a rectangular surface would look like from a bird'seye view. In lecture, we learned about how rectification was applied to artwork to extract tile patterns and such, so it was awesome to be able to apply it to my own photographs.
Part B: Feature Matching and for Autostitching
Choosing corresponding keypoints by hand is a laborious process. To get around this, I implemented automatic keypoint matching using a technique involving "MultiScale Oriented Patches" (MOPS).
Step 1: Detecting Harris Corners
Using the provided starter code, I extracted Harris corners from the following image of my desk.
Original Desk Image

Desk with Corners

Next, I reduced the set of Harris corners via a technique called adaptive nonmaximal suppression (ANMS). ANMS tries to pick out the corners with the highest strength (as computed by the starter code) while keeping the keypoints relatively spaced out. The ANMS procedure can be visualized as follows: create a circle at every keypoint, initially of radius zero. Now expand all the radii simultaneously: if two circles intersect, remove the keypoint with the lower corner strength. The radius at which a keypoint is removed is called its "suppression radius," and by growing the radii in this manner, we cull the set of keypoints to produce a smaller set that is both spaced out nicely and contains corners of high strength. Computing the suppression radius of the ith keypoint is formulated as the following optimization problem:
\(
r_i = min_j x_i  x_j
s.t.
f(x_i) < 0.9 * f(x_j)
\)
Here, \( x_i \) represents the position of the ith keypoint, and \( f(x_i) \) represents its corner strength. I chose to keep the keypoints with the top 500 suppression radii. The ANMS corners are visualized below.
Desk with Corners

Desk with ANMS Corners

Steps 24: Extracting Features + RANSAC
After retrieving the 500 ANMS keypoints, I extracted a 40x40 image patch around each keypoint. I then downsampled this patch to be of dimensions 8x8 using a box blur. These 8x8 feature descriptors were then bias/gainnormalized so their means were zero and standard deviations were one. Next, I matched the descriptors using Lowe thresholding: for each descriptor in image 1, I computed the ratio between the distance to its first nearest neighbor and second nearest neighbor in image 2. I used a threshold of 0.25, so only (first nearest neighbor) matches below this threshold were kept.
After the feature descriptors between the two images were matched up, I used 4point RANSAC to calculate a robust homography estimate to represent the transform between the images. The RANSAC algorithm goes as follows: first, randomly select 4 matched points, compute the exact homography between them, and determine how many of the remaining matched points are "inliers" within this homography (whose transform error is at most 2 pixels in any dimension). I ran this 10,000 times and kept the homography that had the largest number of inliers, using leastsquares at the very end to create a new homography using all the inlier points.
Step 5: Mosaicing
The matched keypoints from the previous step were fed into the stitching algorithm from part 1 to create large mosaics. The manuallystitched and automaticallystitched mosaics are compared below.
Balcony Image 1

Balcony Image 2

Balcony Image 3

Balcony Manual Mosaic

Balcony Automatic Mosaic

Living Room Image 1

Living Room Image 2

Living Room Image 3

Living Room Manual Mosaic

Living Room Automatic Mosaic

Desk Image 1

Desk Image 2

Desk Image 3

Desk Manual Mosaic

Desk Automatic Mosaic

What Have I Learned?
The coolest part about this project was implementing adaptive nonmaximal suppression. I liked how the problem of finding a set of nicelyspaced keypoints of high strength was first phrased as a geometric problem involving circles centered at each keypoint that grow in size until more and more keypoints are eliminated. This offered visual intuition that helped me understand what the solution was going for. Moreover, I appreciated how this intuition was translated to an optimization problem that was relatively easy to solve, and produced great results. As shown in the images above, I thought it was impressive how the algorithm was able to cull a dense set of thousands of Harris corners down to a set of 500 keypoints that were wellspaced out.