Project 5: (Auto)Stitching Photo Mosaics

Part 1: Image Warping and Mosaics

Computing Homography

This derivation was really helpful for understanding not only what the system of equations to solve was, but how to set it up.

Image Rectification

After computing the homography, we can warp the images (in a similar manner to how we did in Project 3). One key issue I had was using forward warping instead of inverse warping; this caused the output to have black dots/patches scattered throughout.

I defined the point correspondences to be the rectangular surface in the original image, and the other to be a “frontal” rectangle/square as appropriate.

I did have issues with figuring out how to shift the pixels after swapping to inverse warping, which is why (I think) the cardboard box image gets cut off on the left side.

Original Frontal-planar

Mosaics

Shot from indoors because COVID (^:

I originally had 7 sets of images; I chose the ones that had the easiest point correspondences to show the manual mosaics.

I used np.maximum to combine the two images, so the edges are pretty obvious in this output.

Set 3

Warped 3B to 3A’s perspective by mapping the rocks in the center of the images, and a few points of the mountains in the background

Image A Image B
Image B warped Mosaic

Set 4

Warped 4B to 4A’s perspective by mapping the various points of the mountains in the background

Image A Image B
Image B warped Mosaic

Set 7

Warped 7A to image 7B’s perspective by mapping the door, the painting, and the edge of the TV in the correspondence.

Image A Image B
Image A Warped Mosaic

Part 2: Feature Matching and Auto-stitching

Harris Point Detection

For this sub-part, I just used the given starter code implementation, which calculates interest points and returns the Harris matrix and a list of coordinates for corners detected by the algorithm.

Image 3A (resized to be 15% of original size)

Adaptive Non-maximal Suppression

The paper had a lot of words about this, but the idea is: for each point, calculate the minimum radius at which it is a reasonable maximum compared to its neighbors (using formula given in paper). Next, take the points whose minimum radii exceed some threshold.

Getting feature descriptors

For every ‘good’ coordinate outputted by ANMS, we want to get the actual feature descriptor, which just involves getting the an 8x8 patch sampled from the 40x40 patch centered at that pixel. At this point a relatively insignificant design decision needs to be made - which of top/bottom left/right to favor in getting a descriptor.

Feature matching

I used the provided dist2 function to calculate the nearest neighbors. Once I had mappings of image 1 coordinate | image 2 feature coordinate | nearest neighbor error | second nearest neighbor error, I calculated the average of the second nearest neighbor error. Finally, I iterated over the features one more time to filter by thresholding with a value of 0.5 (i.e if the nearest / average second nearest < 0.5 I accepted it).

RANSAC

Implemented following the paper’s instructions. Used np.delete notably, and limited how many points could be eliminated (I wanted a minimum of 10 points for the homography) for better or for worse. Sampled with random 6 points instead of 4 because the homographies produced using 4 were pretty terrible.

Warping with the auto-computed H

I modified my warp function from Part A to adjust for grayscale photos, but evidently the auto-computed H and outputted point mappings were off enough that the warping failed.

Failures / bugs

Part 1

I created the new image with np.zeros(rows, columns, 3), and attempted to pad by increasing rows and columns by a certain shift, calculated by finding the difference between image size and the point coordinates resulting from polygon(). To my surprise, this changed the points mapping in the resulting image, so he positioning would be off and stacking would no longer work correctly ):

I had a lot of issues using numpy to add padding to the image, and couldn’t really figure out the bug. I’ll fix it up by the time the second part is due.

For example, I warped image A to image B’s perspective by mapping the door, the painting, and the edge of the TV in the correspondence.

Image A Image B
No padding (position preserved) Padding by adjusting shape

Chosen image (position is preserved, but due to lack of padding the warped image, the interesting part is cut off)

Stacked version of images A and B. I used np.maximum to combine the two images, so the edges are pretty obvious in this output.

Example with attempt to pad and then stack (looks as if no warping happened)

I later figured out this bug to work on the second part of the project - I was mistakenly not only adding the shift to creating the new warped image, I was also shifting the output coordinates back by the same shift. What I needed to do was add the shift to get the dimensions for the new image, but avoid shifting (incorrectly) the output coordinates back.