Hello, world!

CS194 - Computer Vision

Project 4: Stitching Photo Mosaics

Nick Lai

Overview

In this project, we aim to be able to create agorithms that allow us to smaller pictures together to form a larger panorama or mosaic. Perform warping and other actions to generate a composite image. The overall project is divided into two parts. This is the second part of the project

Feature Matching for Autostitching

In the second part of the project, the goal is to be able to create algorithm that can automatically identify feature points in two images of a panorama, and then compute a homography matrix that permits auto-mosaicing of the two images into a panorama.

To achive that, we first obtain a set of feature points automatically, filter for most evident ones within a certain radius, and then identify their closest matches in the other image. Then we apply a RANSAC algorithm over all the points to eventually obtain the transformation for which most best feature matches would result in the most correct-looking transformation, identified by the quantity of best feature matches which remain valid within an a margin of error.

Harris Interest Points Detector

Perhaps the easiest part of the project, this part simply requires us to use the provided method get_harris_corners and make some minor adjustments just for my ease of use.

Do I know how it works? Only conceptually. Do I know why it works? Nope. However, it gives me a nice little set of points which I can use to plot on my images to give the impression I know what I'm doing.

lks sample=100.jpg

I noticed if I put a threshold on the distance between harris points, and with increasing the threshold, I can drastically reduce the number of corners sampled (and to a certain extent, maybe even the quality).

lks distance=10.jpg

lks distance=30.jpg

Adaptive Non-Maximal Suppression

Now, we want to reduce the quantity of points we have, but we want to get better points as well. For that purpose we would like to apply the Adaptive Non-Maximal Suppression algorithm, or ANMS for short.

How ANMS works, is that we obtain all the harris conerns, and sort them in order of descending magnitude, of their distance from the nearest, corner point of higher magnitude. This means that every point remaining on the image is the local largest maxima within a set radius relative to all the other maximas.

lks ANMS sample=100 distance=2.jpg

lks ANMS sample=100 distance=4.jpg

lks ANMS sample=100 distance=10.jpg

Interestingly, the above three images tells us absolutely nothing. So I have generated a single image with some more useful datapoints. The colours remain the same respective of the ANMS samples with distance 2, 4, and 10, however, the yellow points are the points which overlap between all 3. This goes to show that a surprising many points are shared between the 3 different sets of ANMS points despite the distances.

lks ANMS sample=100 distance=2.jpg

Feature Descriptor extraction

After obtaining the corner points, we want to obtain little 8x8 matrices that represent the 40x40 patch of pixels surrounding the corner pixel. This was simply a matter of matrix splicing the pixels surrounding our corner pixels, resizing it such that it is now 8x8, and then normalizing the values.

Here are some example windows:

Window1 40x40.jpg

Window1 8x8.jpg

Window1 8x8 normalised.jpg

Window2 40x40.jpg

Window2 8x8.jpg

Window2 8x8 normalised.jpg

Window3 40x40.jpg

Window3 8x8.jpg

Window3 8x8 normalised.jpg

You may have noticed that the regular and normalised patches look almost identical in the plotting of these windows. You might also realise that because matplotlib.pyplot tends to normalise the plotted numpy arrays for better viewing, the images plotted tend to come out as normalised regardless of actual values. If that's the case, you are already much brighter than I am, who spent a significant portion of my time figuring out why my normalisation wasn't working great. Until finally, I decided to take a look at the magnitude of each pixel and facepalm myself hard enough to set me 20 minutes back from holding a bloody nose.

Feature Matching

Feature matching was simply taking all the patches in one image, and matching them against all the patches in the second image. This section was surprisingly straightforward as it was simply a matter of two nested for loops, in which we subtracted the values from one patch from the other, and then squared all the values within. Thanksfully, numpy allowed for some pretty quick operations. Then, we simply took the magnitude of each patch, and then kept the patch which had the least, and second least difference.

Implementing the Babushka Algorithm, feature selection was simply a matter of comparing the first and second least different patches magnitudes, and dividing the best by the second best to obtain a ratio between 0~1. This was represented as a unitless lagnitude which we stored along with the point of best match. This allowed us to filter for the top points by setting a threshold that only allowed a difference ratio of < n which we have defaulted to 0.4.

Here is an image where all the red points are the points found by the anms, and the remaining blue points are the points which have a sufficiently low threshold.

lks ANMS sample=100 distance=2.jpg

RANSAC

Now, RANSAC was perhaps the quickest, yet jankiest part of the project overall. From the best matches that we have generated using the features matching algorithm from last section, we randomly select 4 points for which we compute a homography from, and then pray that it's half-good. Then we simply keep repeating this process, while keeping the best match.

We measure the "goodness" of a homography by applying the transformation to the feature points of the image, and see if they land within a certain radius around their corresponding matching point. Then we simply count the number of points which and keep the homogorahpy that would result in the most matches.

library1+2 ransac points.jpg

llibrary1+2 mosaic.jpg

lks3+4 ransac points.jpg

lks+2 mosaic.jpg

Aye, they look pretty good :D

What did I learn?

I felt that most of the exercises was just an implementation of my understanding of these processes. And small surprise was how efficient janky-looking datasets and odd points was in generating pretty good homographies.

A personal sort of self-exploration thing was how interestingly I have to think about these problems in computer vision, how to understand usage of matrices and manipulations that allow the computer to "blindly" adapt and create some pretty amazing morphs and computations. I guess I need to recalibrate my understanding of how computer vision works :P