Project 5 Part A: Image Warping and Mosaicing

Thanakul Wattanawong

For this part of the project I learned how to recover homographies from pairs of images, and use that knowledge to rectify images and blend them into a mosaic.

Shooting Pictures

I and a couple of other students in the class went to take some photos, and while you will see more here is a pair of images that are connected by a projective transform along with their correspondence points:

Sather Gate

Recover Homographies

Using the technique mentioned here, I was able to compute the homography matrix H that maps points p to points p’. For example, for the Sather Gate set above I was able to recover the following H matrix:

[[ 2.32919277e-03  8.71539605e-05  6.84444894e-01]

 [ 5.44256211e-04  8.47839003e-04 -7.29059278e-01]

 [ 2.46026308e-07  9.56601540e-09  1.14403012e-03]]

Image Warping

Once I was able to reliably compute H, I moved onto image warping, and used the H image I found above to inverse warp the entire left image onto the right image. Here are the results.



Image Rectification

Having built all the building blocks, I moved onto image rectification with other images in my set along with this one, by mapping a square region of the original image onto a square region on an empty canvas. Here are the images, their initial correspondence points, and the rectified images.

Seal on Sproul

Image Stitching into Mosaic

Given a set of images, I used the techniques developed above to warp all into one image and then displayed them on top of each other using a weighted average across the intersection only. In terms of the average, I computed the intersection polygon, weighted the average and feathered it linearly left to right across that polygon, which has decent enough results.

For blending I performed a linear blend across the overlap region only in order to ensure a smooth transition. One disadvantage of this is imperfect homographies show up as a blur similar to a shaky camera hand.

 Here are three sets of mosaics:


After Mosaic

Cesar Chavez Student Center

There is some blurring in the middle, and I believe this could be due to an imperfect homography due to moving the camera laterally or badly selected points.


This turned out pretty well.

The building that was burning around this time

As you can see this one did not warp too well on the right side. I think I actually rotated on my body while taking this image instead of rotating the camera, and so it wasn’t a perfect homography. This might be a good test case to map onto a sphere with a bigger radius.

Bells & Whistles Part A:

Video Mosaic

In order to do this I went down to University at California (Ave. and St. respectively, heh) to capture a busy intersection. Unfortunately, I only have one camera (I tried), and so wasn’t able to capture simultaneous video and so this ends up being a bunch of teleporting cars. To compute this I took the first frames from each video, then used a single homography to produce the video mosaic below: 

Clearly the result is very imperfect since my hands are not stable, and it drifts over time. Furthermore, the overlap region can clip due to overlapping colors and my alpha blending strategy. In part B I have this done with automatic stitching to fix the drift.

Project 5 Part B: Feature Matching for Autostitching

Detecting corner features in an image

In order to do this I used the staff starter code. However, I  had to change the corner peak function to corner_peaks with min_distance=5 in order to reduce the number of points. Here’s an example of an image with it’s corner points detected:

Adaptive Non-Maximal Suppression

The paper here was really confusing but I wrote out the code to solve the optimization problem sort of blindly and it managed to give good results. For hyperparameters I used c_robust=0.9 and 300 highest points as in the paper. Here’s an example of the above scene after ANMS:

As you can see the points have been spread out very well.

Extracting a Feature Descriptor for each feature point

After that, in order to extract a feature descriptor, I implemented the paper with the exception of rotational invariance (which I did later). This meant blurring the image and I used SciPy’s gaussian_filter to do this with a sigma of 5. Then I cropped a 40x40 patch around each point and downsampled it to 8x8 then performed normalization. Here’s an example of the 300th patch from the above image. The arrow is there because I also computed the gradient which we will use later for rotational invariance. Everything before extra credit was done with axis-aligned patches.

100x100 patch pre-downsample

I picked a bigger patch to show more context but the crop was 40x40 in the center.

After being downsampled

Matching these feature descriptors between two images

Afterwards, to match these patches I computed a 300x300 SSD matrix to find errors between every pair of points. Then, in order to match the point I used the 1-NN/2-NN ratio described in lecture with a threshold of 0.2 to determine which points made it. Here’s an example with Cesar Chavez and the associated histogram of 1-NN/2-NN errors. Numerical indices are plotted to show matched points.

Using RANSAC to compute a homography

Then, I used RANSAC to compute a robust homography. As per lecture, this involved selecting 4 random points, computing an exact H, then computing inliers and taking the argmax that produced the most inliers. I used epsilon as 3 and 20000 iterations since it was cheap to compute. Here’s an example of the warped second image using manual points vs RANSAC:

Manually labeled


As you can see the results are very similar showing that we are doing well.

Mosaics produced using the technique above

Here I reuse a couple sets from part A so we can see the improvement. There are also others I’ve computed thrown in:


Manually labeled


Clearly this one has a much better homography. There is no blurring at the middle region.

This was pretty good in both cases.

This one didn’t actually turn out as well as I hoped even after adjusting hyperparameters. You can see the building is clearly mismatched causing a blur, and I think this is again due to how I moved the camera too much. Another reason is the overlap is not too great and from the debug points I see that the right side of the building wasn’t matched at all.

Here are a couple that I don’t have manually labeled examples to compare with:



Kimi no Na Wa aka an excuse to watch anime while doing a project

This feels like cheating since it’s a pan on a planar surface and this is trivial to mosaic. You can see the overlap mask I computed:

Hearst Mining at Sunset

This one is also a little blurry. I will have to play with hyperparameters.

A fishway somewhere I forgot where I took this

This turned out pretty well except for the trees, which were probably moving with the wind.

Berkeley from Lawrence Hall

This turned out very well which was probably due to the number of corners.

What I learned

This was a really nice contrast to simply trusting deep learning for project 4. It definitely helped me learn how to become more fluent in manipulating images.

Part B Bells and Whistles:

Rotational invariance for descriptors

This was surprisingly straightforward to implement. I used np.gradient to calculate the gradient, and rotating a larger patch before fully cropping to 40x40 to process with. For reference here are the points chosen post-ANMS along with indices:

Then, to give an example, we see that point 2 matches pretty well with point 15 (not guaranteed to be chosen since it might be suppressed), so we can illustrate their pre-rotation patches patches and gradient then afterwards.

100x100 Patches pre-rotation:

Post rotation, cropping, and downsampling, you can see the white edge is now oriented upwards as the arrow now points right:

Final points selected from RANSAC:

The final point selection using the same parameters for everything else found 16 points for RANSAC, compared to 55 without rotation. That’s a pretty major regression. Although the final image came out fine since 16 points is plenty, my theory is that the Gaussian blur is more effective if rotation is very minor as it is in this case, or the gradient could be more unstable, or the rotation causes patches to match against other similar but rotated patches more often causing higher ½-NN error ratios. It could be that this is more robust but compared to axis-aligned patches it didn’t perform too well.

Video Mosaic with automatic recognition

I applied my video mosaic technology to produce one automatically. In terms of runtime this increased the per-frame computational cost by about 16x, but the mapping is much more robust. Along the way I had to solve some minor issues like padding but in the end I produced the following video mosaic: 

I’m not actually sure this is better for viewing due to the shakiness. My initial thought is probably to perform mapping but weighing points that have already been used (perhaps with SSD).