Project 5

In this project we will take advantage of the fact that two different views of a camera can be linked together using a homography under certain conditions. By calculating the homography, we will be able to go from one view to another, and even stitch together different views into a mosaic.

Recovering Homographies

Corresponding points in images are linked by a homography if the images are simply rotated views where the camera's center of projection doesn't change. If the camera's center of projection does change, we can still have a homography if the images are of an approximately planar surface.

Assuming that we have a homography, how do we solve for it? The homography we are trying to solve for transforms homogeneous coordinates in the following manner:

$$\begin{bmatrix} a & b & c\\ d & e & f\\ g & h & i \end{bmatrix} \begin{bmatrix} x\\ y\\ 1 \end{bmatrix} = \begin{bmatrix} x'\\ y'\\ 1\\ \end{bmatrix} $$

Since multiples of a point are equivalent under homogeneous coordinates, one of the variables in the homography can be divided out, leaving us with only 8 unknowns to solve for. This means that 4 corresponding pairs of points should be sufficient to solve for the homography. Dividing out the ninth variable $i$ and fixing $i=1$ yields an inhomogeneous system of linear equations that we can solve, but it assumes that $i\neq 1$. Instead, we will use a more general method that involves solving a homogeneous system and doesn't require a particular variable to be nonzero. Each correspondence gives the following equations: $$ax+by+c=x'\\ dx+ey+f=y'\\ gx+hy+i=1 $$ Combining the equations yields: $$ax+by+c=x'(gx+hy+i)\\ dx+ey+f=y'(gx+hy+i) $$ Then: $$ax+by+c-gx'x-hx'y-ix'=0\\ dx+ey+f-gy'x-hy'y-iy'=0 $$ Using $j\geq 4$ correspondences, we can write: $$A= \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1'x_1 & -x_1'y_1 & -x_1'\\ 0 & 0 & 0 & x_1 & y_1 & 1 & -y_1'x_1 & -y_1'y_1 & -y_1'\\ &&&&&\vdots\\ x_j & y_j & 1 & 0 & 0 & 0 & -x_j'x_j & -x_j'y_j & -x_j'\\ 0 & 0 & 0 & x_j & y_j & 1 & -y_j'x_j & -y_j'y_j & -y_j' \end{bmatrix} $$ $$h= \begin{bmatrix} a\\ b\\ c\\ d\\ e\\ f\\ g\\ h\\ i \end{bmatrix} $$ $$Ah = 0 $$ A least squares solution can be obtained by finding the eigenvector of $A^TA$ associated with its smallest eigenvalue. This vector can be found using the singular value decomposition of $A$, and contains the parameters of the estimated homography.

Image Rectification

With the power of homographies, we can take a part of the image that is known to be planar and view it from the perspective of a plane that is parallel to the camera. For example, we can obtain a "frontal" view of this building, though the other nonplanar parts are somewhat distorted.

Here is another example. The regular tile pattern makes it straightforward to define the necessary manual correspondences. We rectify the camera view into a top-down perspective looking down at the tiles. Low lying debris looks decent enough, though the weeds look much taller than normal.

Image Mosaics

Stitching together multiple rotated camera views into a single image is now possible, provided that we have correspondences between the points of the images that come from the same light rays in 3D space.

Establishing Correspondences

Just as we did for image rectification, we can manually define corresponding pairs of points. Only this time it is between different images instead of a single image and a frontal-parallel plane. However, this takes a lot of careful mouse clicking. Surely there is a way to get the computer to do the work instead.

To do this, the computer needs to identify corresponding features of the images and match them up automatically. The method we will use for this first starts off by performing Harris corner detection on both images. This algorithm uses information about the gradients of an image to identify points that have high corner strength and are therefore likely to be corners, which will be useful for extracting features.

The problem is that many of these detected corners are spurious, and there are too many of them to process quickly even if they were all valid. A possible way to cut down on their number is thresholding by corner strength, but a better method is Adapative Non-Maximal Suppression (ANMS). This works by prioritizing points with high corner strength and also suppressing/discarding points whose corner strengths are sufficiently overshadowed by their neighbors in some radius until we have the desired number of points.

In the example below we go from 5603 corners down to just 500 with ANMS. They are mostly evenly distributed throughout the image, which is a good thing since not all clusters of corners will be shared between images that we want to match.

In the image below, points selected by ANMS are colored yellow.

Extracting Feature Descriptors

With these 500 points, we can derive features by simply taking 40x40 patches of the image (converted to grayscale) and downsampling them to 8x8, taking care to blur them beforehand to avoid aliasing. Once these patches are bias/gain-normalized, we are ready to start matching them.

Feature Matching

Simply matching up feature descriptors by SSD will not result in a good matches, since there are likely a large fraction of features that do not correspond to points in the other image. To resolve this, we will only match on some features by thresholding on the ratio of SSDs between a feature's closest match and its second-closest match. The reasoning is that a good pair of matching features will match with each other much more than any other features. Finally, those features are paired to their closest match.

Below we have two images where this has been done. Points that made it through feature matching are colored red. Only 128 points remain, and there are still a few spurious matches left. The next step will take care of that.

Robust Homography Estimation

With our remaining points we will now run the Random Sample Consensus (RANSAC) algorithm. First, we randomly sample 4 points and compute a homography based on them. We apply the homography to our points and see how close the transformed points are to the corresponding points in the other image. The ones that are transformed correctly enough are called inliers. The process is repeated many times, and the 4 points that result in the most inliers are returned. The idea is that correct correspondences will be consistent with each other, while erroneous ones will be wrong in different ways. Once we have our best 4, the final homography is computed with these 4 and their set of inliers.

And then there were 19. As we can see, these final points look fairly well-matched.

Stitching Images

Whether we got our correspondences manually or automatically, the time has come to combine the images. We compute a homography that transforms points in image 2 to those in image 1. The bounds of the mosaic can be computed by doing min/max with the corners of image 1 and the corners of image 2 transformed to image 1 coordinates. Inverse mapping and interpolation brings over image 2 pixels into the mosaic, while image 1 pixels can simply be shifted over. To deal with the overlap between images, feathering with an alpha channel can be performed. However, this can lead to some level of blur in the boundary between the two images, especially if the alignment isn't perfect. I opted instead to do blending with a two level Laplacian pyramid at the edges where image 1 runs out in the middle of image 2.

This preserves quality and blends high frequencies well, but minor differences in color can remain visible as an abrupt change across the border. This is most likely the result of slight changes in lighting conditions between shots and the small depth of the Laplacian pyramid.

Results

Here are the fruits of all these algorithms. The original shots are displayed first. Then a mosaic created from automatic correspondences. Afterwards, a mosaic created from manual correspondences is shown for comparison.

Here my manual correspondences are slightly off compared to the automatic correspondences, which can be seen in the background fence.

For the rest of these mosaics, automatic correspondences would probably have a more noticeable effect if I decided to use alpha blending, since that would depend on proper alignment more.

Conclusion

This was another fun, albeit time consuming project. The most interesting thing that I learned from the first part of this project was that the mathematics of going from one rotated camera view to another is actually much simpler then I thought, and that four pairs of points are enough to solve for the transformation. It seems useful for recovering planar patterns taken at extreme angles, and it doesn't require any special information about the camera, not even the rotation of the view.

From the project overall, I found it very cool that intuitive heuristics such as looking for the largest number of inliers or looking for matches that stand out from the rest can work so well in practice. It's nice when pithy anecdotes or references to Russian literature can give guidance on technical problems. Human insight isn't always easily implementable in a computer program, but it's quite convenient when it is. And in the case of image stitching, it helps automate a tedious task away.