CS 194-26: Project 4

Recovering Homographies

To calculate the homography between two images, at least 8 datapoints (4 pairs of points from each respective image) is needed. To solve for the H matrix, I constructed this matrix:

Image courtesy of Yalda Zadeh

This is just the linear equations created when multiplying out the equation from applying the unknown H matrix to the input points, resulting in the output points. If there are more than 4 correspondences, I use LS to solve for the best approximation of H.

Warp the Images

Similar to how it was done in Project 3, once we create the homography matrix H, we can apply that transformation using inverse sampling!

Rectification

A sticker on the doorway of Lewis Hall:

The side of a box with a heart drawn on it:

Now on to mosaics:

Blend the images into a mosaic

Creating mosiaics is similar to rectification, except we can create any new coordinate system. The general idea I used is to first calculate the transformation from the first image into the 'canvas basis'. This was usually simple, either rectifying something in the first image or just a translation to the middle of the canvas. Then, every subsequent image is added by:

Converting the correspondences between A to B to A' to B to get the first image into the canvas basis.
Calculate the homography between A' and B.
Calculate all respective sample indices in B to populate the canvas (inverse sampling).
Apply with a feathered mask.
Store H(A' to B), so we can use for future mosaic additions that would use B as a correspondent image.

Soda Breeze Way

Result:

Andronico's Grocery

Result:

My Messy Room:

Result:

Bells and Whistles: Video!

I thought it would be cool to see if I could add a mosaic / rectification to a static video. Here are my two results of that:

Note: the quality looks fairly bad on these because of compression to fit them on the website.

Here is a trippy result I got from animating over different canvas coordinate systems!

Overall, this was a really interesting part of the project! I loved the fact that this was all done with a couple of matrix multiplications. Applying video was also extremely exciting-- I'm sure with automatic feature detection it would go a lot smoother and I won't be restricted to mostly static video!

Part B

The previous part relied heavily on human-picked keypoints to compute the homographies over, this can be done algorithmically as well.

Harris Point Dectection

The Harris Point detector picks out corners of an image-- where there is significant second order change in two directions. Corners are important for keypoints between images, as with any translation a change to a corner would be easily detectable, whereas translating lines parallel to their direction is not.

Adaptive Non-Maximal Suppression

Given the points above, we want to pick strong yet spread out points to narrow down oursearch:

Feature Extraction

At each of these points, we pick out normalized features in the surrounding 40x40 square. This is blended and resized to a 8x8 image as so:

Feature Matching

These feature descriptors are then used to perform nearest neighbor matching. Lowe's thresholding is also applied, meaning a point is thrown out if its top two choices for other image matches have too similar of similarity scores. I've also implemented a symmetry check, where a matching is only accepted if its the best for both sides. Here are the left and middle images above with their feature match points shown.

Even with matching, there are some invalid points here, but they will be removed in the next step:

RANSAC

The RANSAC algorithm chooses 4 pairs at random, computes the homography between them, and sees which of the other pairs coincide with that homography. If they do-- this increases the likelihood that these pairs are valid. We run the algorithm for multiple iterations, and return the pair set that resulted in the most agreeing points.

A perfect match!

Results

Auto Picked Room: pretty much equivalent

Overall, this section was extremely exciting! I was not able to have consistent results with static hyperparameters, so most of my images required fine-tuning of thresholds etc. But given all of that, I was amazed on how simple and fast all of the algorithms are!