CS194-26 Project 6 Writeup

Project Overview

For part A of this two-part project, I explored homographies and tranforming images. I experimented with "rectifying" images by making them appear planar, as well as producing visually pleasing mosaics of multiple images. One of the mosaic pairs (of downtown Berkeley) was taken off of the Berkeley campus.

I detail the theoretical musings, approaches taken, and results for various experiments in these domains below.

Part A: Image Warping and Mosaic-ing

Shoot the Pictures

I shot most of the pictures with a Nikon D3300, and I shot one of them with the iPhone 6S. For all the mosaic pictures, I took the pictures in pairs, and stayed at the same place (same center of projection), but shot at different angles.

Recover Homographies

I used Matlab's cpselect tool to manually generate correspondences between images. I played with the cpcorr tool, which uses local neighborhood matching to improve correspondences, but I found that not using the tool and just being very careful worked best when generating good correspondences. As a modification to this scheme; for image rectification, I used the vertices of a simple unit square as one of the correspondence point sets, which I scaled and shifted later, as appropriate.

Warp the Images

I used skimage.io's warp which was stated to be permitted on Piazza. I experimented with different kinds of interpolation, but found that default bilinear interpolation was fast and usually very good.

Image Rectification

Approach:

In this section, I "rectified" a chosen surface which I knew was flat intuitively, projecting it onto a planar surface such that it the rectified image appeared to be viewing the surface from above. I marked out correspondences between the corners of the chosen surface, and those of a unit square, which I later scaled and shifted (tuning by eye). I then transformed the surface to the overhead planar view by finding the homography matrix and appropriately warping.

Results:

Original roof image:

Rectified roof image:

Original scones image:

Rectified scones image:

Discussion / Failure Cases:

The rectified images look very nice! I noticed there is a small amount of aliasing in the images, which is expected as we are interpolating more information than we have by doing these warps. The biggest failure case occurs (as is visible in the roof rectification) when there is some depth perspective in the original surface, since it will not disappear after the warp as it should.

Blend the Images into a Mosaic

Approach:

In this section, I projected one image taken at the same center of projection (but different viewing angle) onto the plane of another, forming a mosaic. I manually inputted correspondences between the images and calculated the homography matrix. I then warped one image into the other, and then generated a mosaic by taking a weighted sum of the two final images. The result had some lighting inconsistencies, so I applied bilinear gradient masks to each section of the mosaic ( uniquely image 1, uniquely image 2, and the overlap region) and tuned the mask parameters until I obtained a visually pleasing mosaic.

Results:

Original campus images:

Campus mosaic:

Original downtown images:

Downtown mosaic:

Original rooftop images:

Rooftop mosaic:

Discussion / Failure Cases:

The mosaics seem very nice! The only issues seem to be small lighting-difference edges and some small aliasing effects. Ideally, I could have done multiresolution blending to smooth out these artifacts, but they are already very small so I decided it was not worth the time. I found in this section that very accurate correspondences (and many of them!) are required. Correspondences being off just by a little bit could completely throw off the homography.

Part A Summary

I learned a lot about actually implementing homography mosaic-ing! The most important and surprising thing was that correspondences that are just a little bit off can completely screw up the resulting homography. Since the most tedious part of the project is selecting these precise correspondences, perhaps the most important avenue to look into moving forward is ameliorating this process.

Part A: Feature Matching and Auto-stitching

Step 1: Detecting Corner Features

The class staff provided a utility for detecting Harris corners. To obtain a nice spatial spread of feature candidates, I followed the paper and implemented Adaptive Non-Maximal Suppression (ANMS). The idea was to define a suppression radius for each candidate point. Each radius represented the distance from the point to the nearest neighbor with a sufficiently high corner response (this threshold was tuned). I then selected the points with the largest such radii, which ensured that my selected candidates were nicely spread throughout the images.

This image illustrates the selected corner candidates during the processing of one of my images:

Step 2: Extracting Feature Descriptors

Since ultimately I wanted to match feature points, I had to extract a feature descriptor for each point. To extract a feature from each point, I took a 40x40 pixel patch from the grayscale image around the appropriate point, and then downsampled it to an 8x8 array (intuitively, to generalize the descriptor), finally biasing the descriptor to zero-mean and normalizing its standard deviation.

Step 3: Matching Feature Descriptors

Given the features, I matched them with a nearest neighbor scheme. I used an efficient structure to find the nearest neighbor feature for each feature, and following the paper's guidance, selected this neighbor as a matching guess. However, as the paper details, this approach results in too many incorrect matchings. However, rejecting matchings with close nearest and second nearest neighbors (intuitively, meaning ambiguity and uncertainty) by tuning a threshold eliminated many of these spurious matchings. A benchmark threshold value was taken from the paper's results, but tuning was needed for image-specific application.

This image illustrates the labeled feature matchings after Lowe-based outlier rejection:

Step 4: RANSAC Outlier Processing

The feature matchings still contained too many bad matches. I ran iterations of the RANSAC algorithm to find the most tenable subset of matchings, rejecting the outliers. At each iteration of the algorithm, I generated a homography from a random choice of four matches in the matchings. Using this approximate homography, I calculated the appropriate projection of each point after applying the homography; points with match guesses too far (this error was tuned appropriately) were considered outliers. Collecting all inliers, I tested to see if the iteration's batch of inliers was the largest I had obtained so far from an iteration. If so, I kept that set of inliers as my best possible guess so far for good feature matchings.

This image illustrates the RANSAC-filtered feature matching inliers:

Discussion / Results:

The automated process works extremely well and is very robust! However, a few words of caution; as can be seen in the last mosaic below (of the restaurant interior), it seems that given our taken shortcuts from the paper's methods (such as rotationally non-invariant features, ignoring wavelet transform processing, and ignoring multiscale), some sets of images are difficult to stitch without some mismatch. In particular, the mosaic below only admitted correspondence points within a small region (regardless of tuning), so while the stitch was accurate in that locality, outside of it the stitch is quite mismatched. In addition, sometimes to avoid spurious RANSAC inlier subsets, a very tight Lowe outlier-rejection threshold had to be enforced to really reject as many incorrect matches as possbile. Nonetheless, the process for the most part worked much better and more automated than expected!

Campus mosaic (manual stitching):

Campus mosaic (automatic stitching):

Downtown mosaic (manual stitching):

Downtown mosaic (automatic stitching):

Rooftop mosaic (manual stitching):

Rooftop mosaic (automatic stitching):

Restaurant mosaic (automatic stitching):

Part B Summary

I learned a lot about how to read and extract necessary information from a research paper! I found the most important thing was the design principle; I enjoyed reading how the authors really tried to make their method versatile and cover as many possible sources of error as possible to make their method appear to be as cleanly automatic as possible. In addition, I thought it was cool how really a lot of an image's information is stored simply in its enclosed corners.