The homography matrix is used to warp points from one image to another. To get the warped points, we must divide the first and second coordinates by z to obtain the new x and y. The following is the homography matrix, which has 8 degrees of freedom.
The values in the homography matrix can be computed by solving the following system of equations by least squares. Here, x_hat are the destination points, while x are the source points. The goal is for the homography matrix H to warp the points in the source image into the points in the destination image.
To warp an image, a set of correspondences were hand selected. Then the homography matrix was computed using these points. Inverse warping was implemented using the computed homography matrix.
These images were warped so that they were frontal parallel.
To form a mosaic, I selected key points on two images and I warped one image so that it's key points would match the other image. Then I stitched the two together and blended the boundaries using Laplacian pyramids.
I used the given starter code to detect the harris corners in images. I increased the min_distance parameter to 5 to reduce the number of harris corners selected to make the visualization less cluttered.
Following the description in the paper, I narrowed down the number of corners by selecting the harris corners so that they are spaced out evenly around the image by picking the strongest point within a given radius. I started with a radius about equal to a quarter of the image and steadily decreased it until I got more than 100 corners. The corners that are selected have the highest score within their vacinity, so they are the best corners for that area of the image. The following images show the result after Non-Maximal Suppression was applied to the initial Harris Corners.
A feature descriptor was generated for each of the corners selected from the previous part. The feature descriptor consists of 8x8 pixels that are down sampled from a 40x40 patch around the corner. These feature descriptors were used to pair up corners that are good matches. A pair is determined to be a good match if the ratio between the SSD scores of the best match and the second best match is sufficiently small. In my implementation I made sure that 1-NN / 2-NN was less than 0.2. This is known as Lowe's Trick. The following images show the corners that have found a correspondence using Lowe's Trick with a threshold of 0.2.
The RANSAC algorithm consists of repeatedly randomly selecting 4 pairs generated from the feature matching step above and computing the homography matrix H using those pairs as the correspondences. Then after using that H to warp the rest of the points, we count the number of inliers, which are the points that match with their expected correspondences after the warp. After the RANSAC loop iterations, the best correspondences and their inliers are used to compute the correct homography matrix that is used to execute the perspective warp. The following images show the correspondences selected after running RANSAC.
The following are the resuling mosaics stitched using automatic feature detection shown next to the resulting mosaic from manually selecting points.
The coolest thing I've learned from this project was the RANSAC algorithm, because it is so simple and elegant and it produces really nice results. It's really interesting that algorithms that rely on random chance can eventaully get the right answer if repeated enough times. I also think it's a really elegant solution to count the number of inliers after the homography is computed from randomly selected correspondences. Normally guessing and checking is discouraged and tends to take an unreasonable amount of time, but 10000 loops of RANSAC ran in just a couple of seconds and successfully found correspondences that work well for the homography.
As a bonus mosaic, I took images on the firetrails (I forgot to lock the exposure and focus, so the edge did not blend very well). I thought it was really cool how this mosaic was able to be autostitched pretty well, because manually picking accurate correspondences for images like this would be difficult as there aren't obvious corners to the human eye.