[Auto]Stitching Photo Mosaics

CS 194-26 Project 4 Shivam Singhal October 2021

Part 1: Image Warping and Mosaicing

For the first part of the project, we were tasked with creating image mosaics by registering, projective warping, resampling, and compositing multiple pictures.

Part 1.1: Shooting Pictures

Modern technology has surely made capturing pictures as easy as clicking a button. For this part of the project, I used my iPhone 7 to shoot pairs of images of my bedroom, living room, and piano. Each of the pairs have a perspective transform between them because I shot them from the same point of view but from different view directions and overlapping fields of view. Below are each of the pairs of images:

My Bedroom, View 1 My Bedroom, View 2

My Living Room, View 1 My Living Room, View 2

My Piano, View 1 My Piano, View 2

Part 1.2: Choosing Correspondences and Recovering Homographies

Before warping the pairs of images together, I needed to select correspondences between the pictures. The method in which I chose the points was by manually clicking on 15 locations within each of the images. Below are the points overlaid on the pictures:

My Bedroom Points, View 1 My Bedroom Points, View 2

My Living Room Points, View 1 My Living Room Points, View 2

My Piano Points, View 1 My Piano Points, View 2

Using these points, we can set up a system of linear equations to solve for the homographies between the images. These H matrices will allow us to best align the matching points between the pictures. I used this Towards Data Science article to better understand how to solve for these H matrices.

To get the final points of our warped image, we need to multiply the H matrix by the original points as follows:

Translating Points

where the hat variables are the translated points, and the 3*3 matrix is the homography matrix, H.

To recover H, we will rely on the correspondences that we manually defined previously. Our matrix has 8 unknowns because we assume that the corner value (h_33) is equal to 1, and since we have more than 4 pairs of points, we will have more than enough equations to solve for these. This overdetermined system can be solved using least squares:

Final Least Squares Problem with 4 pairs of Points

I recovered homography matrices in both directions for each of my images, going from view 1 to view 2 and from view 2 to view 1. I also confirmed my results with OpenCV's getPerspectiveTransform function, and the values matched!

Part 1.3: Image Warping

The homography matrices that were calculated in the previous part were now used to warp the images into each other's perspective.

I first padded the destination image so that after I performed the projective transformation all of the pixels were visible. Then, I altered the transformation or homography matrix in order to make sure all pixels would be positive after translation. Lastly, I warped the image by taking all possible indices and multiplying them by the altered homography matrix and using OpenCV's remap function to interpolate the colors from the old image to the new one. Below are my pairs of images warped:

My Bedroom, View 2 warped

My Living Room, View 2 warped

My Piano, View 2 warped

For the website, I chose to only display the second views warped to match the correspondence points of the first views. Additionally, just as I did with the homography matrix calculations, I compared my results to OpenCV's warpPerspective function and confirmed that my values did indeed match.

Part 1.4: Image Rectification

Using our warping method, we can actually change the perspective of our images from the ones that they were shot. For this part of the project, I chose to warp a few images so that their planes are frontal-planar.

To do this, I simply selected four correspondences at the four corners of a rectangular object within each of my images. Then, I used the Euclidean distance formula to determine the height and width of the objects and subsequently used these values to set the coordinates of the transformed and rectified image. Lastly, I used the warp function that I previously wrote to perform the transformation.

This technique allowed me to properly see the art piece displayed on the board in my high school French class.

Original picture of Whiteboard Art Piece Rectified

I was also able to fully admire this painting that my sister made for me because of image warping.

Original picture of Wall Painting Rectified

Before, we couldn't see the artist's name properly, but after warping, we can clearly read my sister's signature!

However, it is worth noting that the pictures are a bit distorted. Rectifying does not really produce sharp images as previous pixels need to be stretched, making certain parts appear blurry. We can't magically create pixel data that wasn't already captured in the original picture.

Part 1.5: Blend the Images into a Mosaic

Using the two lined up pictures that I obtained by applying the calculated homography matrices, I can now create image mosaics!

Simply adding together the pixels in the images wasn't an option because of the overlapping regions. Thus, I tried playing around with weighted averaging using OpenCV's addWeighted function and a sigmoid function to determine the alpha and beta values. However, I found that simply overlaying the images on top of each other produced the best results for my set of images. Below are the image mosaics I created for each of the three pairs of images that I shot:

Room Mosaic

Living Room Mosaic

Piano Mosaic

Because of the change in color and slight misalignments, we can tell where the images were blended. In the next part of the project, we'll see if we can fix these!

Part 1.6: What I Learned

I think the coolest thing about this part of the project was that I was able to rectify images. That will be extremely useful to read things in images that don't have the words in focus, for instance.

Part 2: Feature Matching for Autostitching

Enough with the clicking! For the second part of the project, we were tasked with creating a system for automatically stitching images into a mosaic. I worked with most of the images that I shot in the prvious part of the project, the pictures of the piano and the pictures of my bedroom; however, since I also wanted to create a mosaic out of images of the outdoors, I borrowed some pictures of the Cornell campus from this old project website. This part of the project also borrowed some theory from the following research paper: “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al.

Part 2.1: Detecting Corner Features in an Image

First, I ran the provided Harris corner detector algorithm on all of my images, using a padding of 20 pixels along the edges. Below is the output that I received for each of the pairs of images:

Piano, View 1 Harris Corners Piano, View 2 Harris Corners

Room, View 1 Harris Corners Room, View 2 Harris Corners

Campus, View 1 Harris Corners Campus, View 2 Harris Corners

It is worth noting that while the images of the campus only had around 6,000 harris corners detected, the images that I shot of the piano and the bedroom had more than 300,000 harris corners, perhaps because the scene is so much more busy, which is why those graphs are so much more red. To help with the runtime of the next feature detection algorithm that I ran, I chose to only keep the top 200,000 harris corners for each of the images.

Next, I implemented and ran the Adaptive Non-maximal Suppression Algorithm. After ranking the harris corners in each of the images by their suppression radius, which is defined as the distance to the nearest point with a corner strength that is greater than 1/C_robust of the current point's strength, where C_robust was set to be 0.9. I chose to only keep the top 250 points based on the ranking. Below are the corners that were identified by the ANMS algorithm:

Piano, View 1 ANMS Corners Piano, View 2 ANMS Corners

Room, View 1 ANMS Corners Room, View 2 ANMS Corners

Campus, View 1 ANMS Corners Campus, View 2 ANMS Corners

We can see that while there are still some random points, most of the corners remaining are actually along straight edges or sharp features.

Part 2.2: Extracting a Feature Descriptor for Each Feature Point

Using each of the 250 feature points that I identified in the previous part, I now extracted feature descriptors. Around each point, I created a 40 by 40 pixel patch and resized it to be 8 by 8 pixels. I then normalized the pixel values of each patch by subtracting the mean and dividing the values by the standard deviation. Below are some of the feature descriptors that I obtained for each of my images:

Piano, View 1 Descriptor Piano, View 2 Descriptor

Room, View 1 Descriptor Room, View 2 Descriptor

Campus, View 1 Descriptor Campus, View 2 Descriptor

Each of these descriptors correspond to the top feature points identified in each of the images.

Part 2.3: Matching the Feature Descriptors between the Pairs of Images

The feature descriptors that were extracted in the previous step for each of our pairs of images were useful in finding the best points of correspondence. I first computed the Euclidean distance between all pairs of features in the two views, and I ranked the pairs based on these values. Below are some examples of feature descriptors that matched:

Piano, View 1 Descriptor Match Piano, View 2 Descriptor Match

Room, View 1 Descriptor Match Room, View 2 Descriptor Match

Campus, View 1 Descriptor Match Campus, View 2 Descriptor Match

As we can see, these descriptors are extremely similar to each other.

To narrow down the points, I used the "Russian granny" or Lowe's trick. A threshold of 0.1 was set on the ratio between the distance from the first best and second best feature matches, and it was used as the criteria for keeping a point. Below are the features that remain after I performed feature matching:

Piano, View 1 Matching Features Piano, View 2 Matching Features

Room, View 1 Matching Features Room, View 2 Matching Features

Campus, View 1 Matching Features Campus, View 2 Matching Features

As desired, the features that remain appear in the same part of the scene for both images. Our feature matching worked!

Part 2.4: Using RANSAC to Compute a Homography

Finally, I used the RANSAC algorithm and the correspondences we found in the feature matching step to compute stable homographies between the images. For each of the 1000 iterations of the algorithm that I ran, I randomly sampled 4 of the correspondences and used them to compute a homography matrix. Then, with this H matrix, I performed a perspective transformation on all of the feature points in one of the images and checked if they were close to the matching points in the other image using the SSD. I kept inliers or points that were closer than an epsilon value of 5 in a set for each iteration, and at the end of my loop, I returned the largest set. Lastly, using the set that was returned, I computed a final homography matrix. Below are the features I used to compute the final homographies:

Piano, View 1 Points Piano, View 2 Points

Room, View 1 Points Room, View 2 Points

Campus, View 1 Points Campus, View 2 Points

Part 2.5: Producing Mosaics

I used the automatically computed homography matrices for each pair of input images to warp and mosaic them. Both automatically produced and manual results are shown below:

Automatic Piano Mosaic Manual Piano Mosaic

Automatic Room Mosaic Manual Room Mosaic

Automatic Campus Mosaic Manual Campus Mosaic

As we can see, the automatic and manual stitching methods produce similar results! For the room and piano scenes, I might have chosen slightly better correspondences with the tedious clicking method, but we still received pretty good results with the much easier automatic matching method.

Part 2.6: What I Learned

I think the coolest thing about this part of the project was automatically generating correspondences because I really hated manually clicking. I want to go back and redo the previous projects using this technique of selecting correspondences and see if I get better results.