[Auto]Stitching Photo Mosaics

CS 194-26 Project 4    Shivam Singhal    October 2021


Part 1: Image Warping and Mosaicing


For the first part of the project, we were tasked with creating image mosaics by registering, projective warping, resampling, and compositing multiple pictures.


Part 1.1: Shooting Pictures


Modern technology has surely made capturing pictures as easy as clicking a button. For this part of the project, I used my iPhone 7 to shoot pairs of images of my bedroom, living room, and piano. Each of the pairs have a perspective transform between them because I shot them from the same point of view but from different view directions and overlapping fields of view. Below are each of the pairs of images:

My Bedroom, View 1
My Bedroom, View 2
My Living Room, View 1
My Living Room, View 2
My Piano, View 1
My Piano, View 2

Part 1.2: Choosing Correspondences and Recovering Homographies


Before warping the pairs of images together, I needed to select correspondences between the pictures. The method in which I chose the points was by manually clicking on 15 locations within each of the images. Below are the points overlaid on the pictures:

My Bedroom Points, View 1
My Bedroom Points, View 2
My Living Room Points, View 1
My Living Room Points, View 2
My Piano Points, View 1
My Piano Points, View 2

Using these points, we can set up a system of linear equations to solve for the homographies between the images. These H matrices will allow us to best align the matching points between the pictures. I used this Towards Data Science article to better understand how to solve for these H matrices.

To get the final points of our warped image, we need to multiply the H matrix by the original points as follows:

Translating Points
where the hat variables are the translated points, and the 3*3 matrix is the homography matrix, H.

To recover H, we will rely on the correspondences that we manually defined previously. Our matrix has 8 unknowns because we assume that the corner value (h_33) is equal to 1, and since we have more than 4 pairs of points, we will have more than enough equations to solve for these. This overdetermined system can be solved using least squares:

Final Least Squares Problem with 4 pairs of Points

I recovered homography matrices in both directions for each of my images, going from view 1 to view 2 and from view 2 to view 1. I also confirmed my results with OpenCV's getPerspectiveTransform function, and the values matched!

Part 1.3: Image Warping


The homography matrices that were calculated in the previous part were now used to warp the images into each other's perspective.

I first padded the destination image so that after I performed the projective transformation all of the pixels were visible. Then, I altered the transformation or homography matrix in order to make sure all pixels would be positive after translation. Lastly, I warped the image by taking all possible indices and multiplying them by the altered homography matrix and using OpenCV's remap function to interpolate the colors from the old image to the new one. Below are my pairs of images warped:

My Bedroom, View 2 warped
My Living Room, View 2 warped
My Piano, View 2 warped

For the website, I chose to only display the second views warped to match the correspondence points of the first views. Additionally, just as I did with the homography matrix calculations, I compared my results to OpenCV's warpPerspective function and confirmed that my values did indeed match.

Part 1.4: Image Rectification


Using our warping method, we can actually change the perspective of our images from the ones that they were shot. For this part of the project, I chose to warp a few images so that their planes are frontal-planar.

To do this, I simply selected four correspondences at the four corners of a rectangular object within each of my images. Then, I used the Euclidean distance formula to determine the height and width of the objects and subsequently used these values to set the coordinates of the transformed and rectified image. Lastly, I used the warp function that I previously wrote to perform the transformation.

This technique allowed me to properly see the art piece displayed on the board in my high school French class.

Original picture of Whiteboard
Art Piece Rectified

I was also able to fully admire this painting that my sister made for me because of image warping.

Original picture of Wall
Painting Rectified

Before, we couldn't see the artist's name properly, but after warping, we can clearly read my sister's signature!

However, it is worth noting that the pictures are a bit distorted. Rectifying does not really produce sharp images as previous pixels need to be stretched, making certain parts appear blurry. We can't magically create pixel data that wasn't already captured in the original picture.

Part 1.5: Blend the Images into a Mosaic


Using the two lined up pictures that I obtained by applying the calculated homography matrices, I can now create image mosaics!

Simply adding together the pixels in the images wasn't an option because of the overlapping regions. Thus, I tried playing around with weighted averaging using OpenCV's addWeighted function and a sigmoid function to determine the alpha and beta values. However, I found that simply overlaying the images on top of each other produced the best results for my set of images. Below are the image mosaics I created for each of the three pairs of images that I shot:

Room Mosaic
Living Room Mosaic
Piano Mosaic

Because of the change in color and slight misalignments, we can tell where the images were blended. In the next part of the project, we'll see if we can fix these!

Part 1.6: What I Learned


I think the coolest thing about this part of the project was that I was able to rectify images. That will be extremely useful to read things in images that don't have the words in focus, for instance.


Part 2: Feature Matching for Autostitching


Enough with the clicking! For the second part of the project, we were tasked with creating a system for automatically stitching images into a mosaic. I worked with most of the images that I shot in the prvious part of the project, the pictures of the piano and the pictures of my bedroom; however, since I also wanted to create a mosaic out of images of the outdoors, I borrowed some pictures of the Cornell campus from this old project website. This part of the project also borrowed some theory from the following research paper: “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al.

Part 2.1: Detecting Corner Features in an Image


First, I ran the provided Harris corner detector algorithm on all of my images, using a padding of 20 pixels along the edges. Below is the output that I received for each of the pairs of images:

Piano, View 1 Harris Corners
Piano, View 2 Harris Corners
Room, View 1 Harris Corners
Room, View 2 Harris Corners
Campus, View 1 Harris Corners
Campus, View 2 Harris Corners
It is worth noting that while the images of the campus only had around 6,000 harris corners detected, the images that I shot of the piano and the bedroom had more than 300,000 harris corners, perhaps because the scene is so much more busy, which is why those graphs are so much more red. To help with the runtime of the next feature detection algorithm that I ran, I chose to only keep the top 200,000 harris corners for each of the images.

Next, I implemented and ran the Adaptive Non-maximal Suppression Algorithm. After ranking the harris corners in each of the images by their suppression radius, which is defined as the distance to the nearest point with a corner strength that is greater than 1/C_robust of the current point's strength, where C_robust was set to be 0.9. I chose to only keep the top 250 points based on the ranking. Below are the corners that were identified by the ANMS algorithm:

Piano, View 1 ANMS Corners
Piano, View 2 ANMS Corners
Room, View 1 ANMS Corners
Room, View 2 ANMS Corners
Campus, View 1 ANMS Corners
Campus, View 2 ANMS Corners

We can see that while there are still some random points, most of the corners remaining are actually along straight edges or sharp features.

Part 2.2: Extracting a Feature Descriptor for Each Feature Point


Using each of the 250 feature points that I identified in the previous part, I now extracted feature descriptors. Around each point, I created a 40 by 40 pixel patch and resized it to be 8 by 8 pixels. I then normalized the pixel values of each patch by subtracting the mean and dividing the values by the standard deviation. Below are some of the feature descriptors that I obtained for each of my images:

Piano, View 1 Descriptor
Piano, View 2 Descriptor
Room, View 1 Descriptor
Room, View 2 Descriptor
Campus, View 1 Descriptor
Campus, View 2 Descriptor
Each of these descriptors correspond to the top feature points identified in each of the images.

Part 2.3: Matching the Feature Descriptors between the Pairs of Images


The feature descriptors that were extracted in the previous step for each of our pairs of images were useful in finding the best points of correspondence. I first computed the Euclidean distance between all pairs of features in the two views, and I ranked the pairs based on these values. Below are some examples of feature descriptors that matched:

Piano, View 1 Descriptor Match
Piano, View 2 Descriptor Match
Room, View 1 Descriptor Match
Room, View 2 Descriptor Match
Campus, View 1 Descriptor Match
Campus, View 2 Descriptor Match
As we can see, these descriptors are extremely similar to each other.

To narrow down the points, I used the "Russian granny" or Lowe's trick. A threshold of 0.1 was set on the ratio between the distance from the first best and second best feature matches, and it was used as the criteria for keeping a point. Below are the features that remain after I performed feature matching:
Piano, View 1 Matching Features
Piano, View 2 Matching Features
Room, View 1 Matching Features
Room, View 2 Matching Features
Campus, View 1 Matching Features
Campus, View 2 Matching Features
As desired, the features that remain appear in the same part of the scene for both images. Our feature matching worked!

Part 2.4: Using RANSAC to Compute a Homography


Finally, I used the RANSAC algorithm and the correspondences we found in the feature matching step to compute stable homographies between the images. For each of the 1000 iterations of the algorithm that I ran, I randomly sampled 4 of the correspondences and used them to compute a homography matrix. Then, with this H matrix, I performed a perspective transformation on all of the feature points in one of the images and checked if they were close to the matching points in the other image using the SSD. I kept inliers or points that were closer than an epsilon value of 5 in a set for each iteration, and at the end of my loop, I returned the largest set. Lastly, using the set that was returned, I computed a final homography matrix. Below are the features I used to compute the final homographies:

Piano, View 1 Points
Piano, View 2 Points
Room, View 1 Points
Room, View 2 Points
Campus, View 1 Points
Campus, View 2 Points

Part 2.5: Producing Mosaics


I used the automatically computed homography matrices for each pair of input images to warp and mosaic them. Both automatically produced and manual results are shown below:

Automatic Piano Mosaic
Manual Piano Mosaic
Automatic Room Mosaic
Manual Room Mosaic
Automatic Campus Mosaic
Manual Campus Mosaic
As we can see, the automatic and manual stitching methods produce similar results! For the room and piano scenes, I might have chosen slightly better correspondences with the tedious clicking method, but we still received pretty good results with the much easier automatic matching method.

Part 2.6: What I Learned


I think the coolest thing about this part of the project was automatically generating correspondences because I really hated manually clicking. I want to go back and redo the previous projects using this technique of selecting correspondences and see if I get better results.