Arjun's CS294-26 Proj4a; Scroll down for 4b!

Part 1 Rectified Images

I took images of tilted planar objects. Labeled the four corners and warped those out to the corners of a rectangle who's aspect ratio seemed to correspond roughly with the true objects. The results shown below seem pretty nice. One comment is with the student flex pass banner it was a little wrinkled on the right side. As a result when we rectify these wrinkles persist and are responsible for the interesting effect on that side of the image.

Part 2 Mosaics

For each of three scenes on campus I took a series of 4 photos rotating my phone camera around a vertical axis. For each pair of consecutive images I labeled 10 correspondences and used least squares to come up with the homography. (Math and example for this below.) I started from the outside of the sequence and moved inward each warp+blend leaving one image unchanged and warping the other into its perspective. Ultimately one of the middle frames decides the final perspective as a result. For blending I used an enhanced version of the laplacian pyramid strategy from Project 2. To find the initial binary mask, I considered the centroids of the two images and segmented points based on which centroid they were closer to. After using the laplacian stack algorithm I exclusively applied the gaussian masks to the areas where both images were available obviously only using the single image that covered other parts of the joint canvas. I blurred the initial boundary quite a bit prior to this process as I found that resulted in smoother transitions. Finally for all scenes I also present an automatically trimmed versions similar to what a smartphone may present when you take a panoramic image.

Zellerbach One interesting note about this mosaic is that there was a person walking below while I was capturing the images. This suprisingly worked out alright, we simply have (atleast) 2 distinct versions of the person visible!

VLSB One interesting note about this mosaic is I seem to have done a better job rotating purely around the vertical axis resulting in the untrimmed version closer to the trimmed one. With Zellerbach above there's is quite the fan out effect near the left and right sides.

Trees One interesting note about this mosaic is finding correspondences was more tricky due to the natural scenery. Nevertheless it worked out pretty well, meaning with enough correspondences a little noise isn't too bad.

Homography Math

Example of 10 correspondences.

For every pair (xa, ya) <-> (xb, yb) we have the equation:
[a, b, c]^T [xa ya 1] = w * xb
[d, e, f]^T [xa ya 1] = w * yb
[g, h, i]^T [xa ya 1] = w
Which implies the following two eqs once we eliminate w
a * xa + b * ya + c = (g * xa + h * ya + i) * xb
d * xa + e * ya + f = (g * xa + h * ya + i) * yb
Since we may simply set the scale factor i to 1 we get
a * xa + b * ya + c + g * (-xa * xb) + h * (-ya * xb) = xb
d * xa + e * ya + f + g * (-xa * xb) + h * (-ya * xb) = yb
That's two linear equations with 8 coefficients. Collecting 10 correspondences gives us 20 more than enough to apply least squares and get a reliable solution.

Overall one thing I thought I was pretty cool was how well we could blend together the different images to really make it feel like it came from a single capture. Granted some of the mosaics are better than others, but I felt I wasn't extremely careful taking photos making sure the camera didn't move so thought it was pretty cool that it came out as well as it did.

Arjun's CS294-26 Proj4b

Part 1: Detecting, Extracting, & Matching Features

For this part we'll use a running example of two images from the Zellerbach collection. For detection and extraction of course we process the two images independently, it's only in the matching phase that we compare the two. Consider the below two images:

The red dots represent the corners as identified by the Harris Corner Detector (min_distance=1, edge_discard=25). ~8000 of these are detected for our 521 x 694 image. The blue dots are the top 500 resulting from the adaptive non maximum supression algorithm (largest non-dominated radius). The ~50 green dots are the points with a matched neighbor in the other image (1 - nn_1_diff/nn_2_diff > thresh=0.5; Feature visualizations in Part 3). We go through the matching process iteratively in the order of largest non-dominated radius in the first image and delete candidates from the second image as they're claimed as partners. The ~20 yellow dots are the inlier points found by RANSAC and ultimately used to compute the homography (Standard 8 degree of freedom least squares). The graph next to the images above describes the size of the best inlier set (tolerance 1.5px) that RANSAC found over iterations (Randomly try subsets of 4 points, compute homography using math above, keep the one which maximizes how many green dots map within tolerance to their correspondences).

Part 2: Mosaics

Below we show mosaics for the same three scenarios from before. For each scenario we first show the untrimmed versions then the trimmed versions. Left are copies of the previous manual correspondence results; Right uses auto matching.

What I find really impressive is that the auto matching actually does better than the manual correspondences. I think part of this is that I provided only 6 manual correspondences per pair, here we end up with about 20 per pair. Furthermore, I definitely would have clicked a pixel or two off meaning my correspondences are not much better quality than what RANSAC selects. For qualitative evidence of the above claim checkout the following zoom ins. Again left is manual right is auto match. Moral of the story seems to be with least squares a greater quantity of potentially a little more noisy data is superior.

Part 3: Rotational Invariance

As a bell and whistle I implemented rotational invariance for features. The above examples were without this feature, now we'll evaluate the benefit of this features. First to develop intuition for what's going on consider the pictures below. In the rotated feature patch the gradient is always in the pure x direction. The feature distance metric I used was simple euclidean distance, which is obviously affected by rotating either image.

Below are the new locations we match and select via RANSAC. And below that a comparison of the mosaic generated for Zellerbach without rotational invariance (Left) and with (Right).

It actually performs a good bit worse mixing up most of the correspondences in the upper half of the image. I think the reason for this is actually quite explainable. I roughly maintain orientation between images. Meaning an upper right corner of concrete ought to match with an upper right corner of concrete. When we make the features rotationally invariant it's possible it matches a lower right corner of concrete because when rotated they appear the same. This makes the matching stage more confusing and thus RANSAC will through out these inconsistent pairings. I conclude that for our use case rotational invariance is not a good idea, but in panaoramas where we roll the camera it may be useful (unclear why you would do that though).

I think the coolest thing I learned from this project was how well auto matching based on these features could work. Overall we have a pretty complex multi step process for image stitching but the results in the end are fairly impressive given it's fully automated. Blending artifacts remain the most challenging (even if we blend a lighter and darker photo it still looks weird that the image randomly starts getting darker at some point). This last point is something I'm interested in learning more about how to resolve.