CS 194-26 Fall 2020, Project 5: [Auto]Stitching Photo Mosaics

By: Vincent Lyau


Shoot the Pictures

Here is the first pair of photos I took. (I shot more photos; they, along with more mosaic results, are displayed later on this page).

The images are taken of my Berkeley apartment from the same location. They differ only in the angle; as you can see, the notable features that cross both images are the whiteboard, beanbag, boxes, and red cart.

Recover Homographies

First, I defined the correspondences manually. Here are the images from before, but this time with labeled correspondences.

Obtaining the homographies after I had the correspondences was relatively easy. It involved solving an equation (done with the help of np.linalg.lstsq) to find H in the equation p' = Hp.

Warp the Images

Warping the images required computing some important information first. For example, I needed to roughly know where the corners would be, how much padding I needed, and other details. This I implemented through multiplying the homography matrix with a dummy matrix of just the corners of my input to see where things "end up".

With the above information in hand, I was able to perform a warp on my image. Below is the result of warping.

Perhaps indicative of the how well the warp went is the fact that in the warped image and in image that did not undergo warping, the ceiling/wall connection and the floorboards are, respectively, roughly parallel.

Image Rectification

Before I proceeded with the rest of the project, I first followed the spec to check that my homography/warping is actually functional. Specifically, we take a look at rectifying an image. In this case, my input image is of an iPad on my computer desk.

Here, I proceeded to rectify the iPad image so that we get a top-down view of the iPad.

Here, you can see the iPad is oriented correctly and appears to be viewed from above. There are minor artifacts, such as at the bottom right corner of the ipad, but this is to be expected: we aren't generating information out of nothing, after all.

For this particular image, my methodology was to define correspondences on the iPad image twice: the first time, using points where the iPad actually was, and the second time, using points to indicate where I wanted the iPad to go, roughly.

Below is another example.

As you may notice, the distortion of the parts of the image not being rectified is more extreme than before.

Blend the Images into a Mosaic

First, for the sake of convenience, I apply a simple modification to the image that did not undergo warping in order for it to fit appropriately within the mosaic image space.

Finally, the result of blending the images together (with the originals in front for reference):

I am honestly very impressed by how aligned the floorboards and the ceiling/wall joins are. There is some level of blurriness, coming from slight inaccuracies in setting the correspondences and also from the operations done to each of the input images along the way.

Here are two more examples.

Tell Us What You've Learned

The coolest, if not necessarily most important, thing that I learned from this project so far is probably that even without extrapolating or generating "fake"/"extra" information, we can change the perspective of an image pretty convincingly.


Detecting Corner Features in Images

For this section, I used the provided harris.py starter code. However, I did make the following modification: instead of using peak_local_min, I used corner_peaks as suggested by various Anons on Piazza. This was necessary because not using this change resulted in the generation of hundreds of thousands of corner points, which is far too many for my algorithm to efficiently handle.

I've included an example image with its Harris corners overlaid on top of it.

For the most part, the corners landed where I expected. Some, however, were less expected than others: I didn't realize that all the folds of the large beanie chair would count as corners, although, in retrospect, that makes a lot of sense considering how we define corners.

In this particular example image, the number of Harris corners is relatively low already. However, in other images, this is not the case, and we need to apply Adaptive Non-Maximal Suppression to reduce the number of corners to an acceptable number. I've included below images of the corners once they've been reduced by ANMS (although, bear in mind that for this particular example there should be little difference).

Extracting Feature Descriptors

This was a relatively straightforward task, although implementation was a bit of a bog.

Matching Feature Descriptors between Images

Again, this was a relatively straightforward task. Implementing this took a while to get right, but once done, it produced these results:

I'm very happy with how these turned out. At a glance, you can immediately see that the features do indeed match up! Moreover, even the precise arrangement of the folds on the beanie chair are correct. I'm surprised there was such a huge emphasis on the objects in the cart; but I suppose it makes sense in retrospect.

Computing Homographies with RANSAC & Creating Mosaics

Here, we randomly select 4 points over hundreds of iterations in order to find inlier points that we can use for computing the homography. The results are quite good! I was very surprised at how good the final product is. Below, the first line has the original images, and the second line, on the left, is the hand-annotated mosaic, and on the right, is the automosaic formed from the above steps.

In fact, the automosaic seems to have aligned even better than my hand annotated version! You can see that on the left (my hand-annotated version), the ceiling seam is not perfectly aligned, and the cart area and the text on the cardboard box are slightly blurry. In the automosaic version, the ceiling seam is aligned better and the cart/box are sharper and clearer (because they've been aligned better as well). Super noticably, the beanbag chair is sharp and clear! I was very surprised.

I include below, again in the same format (left: hand-annotated, right: automosaic), two more examples (using the same images from part 1 of this project).

As you can see, the automosaic does quite well! It performs almost as well as my hand annotations in these two examples (although, you can see my hand annotations have a less blurry output in the case of the kitchen mosaic).

What have you learned?

The coolest thing about this project, and what I really enjoyed, was seeing how it was possible to select good feature points and create a good homography from them without using things like machine learning to do so. To me, this implied that a surprisingly vast amount of information is already encoded in relatively simple features of an image (in this case, corners!)