HW 5: Photo Mosaics and Auto-stitching

Andrew Lee

Part 1: Manual Rectification & Stitching

We seek to stitch together photomosaics to produce results similar to the "panorama" function on many smartphone camera applications.

Recovering Homographies

In order to rectify an image or stitch two images together we need to be able to recover a homography. This is a coordinate transform between pairs of keypoint correspondences \(\big\{(p_i, p_i')\big\}_{i=1}^{n}\) where \(p_i = [x_i ~~ y_i~~ 1]^T\) and \(p_i' = [x_i'~~ y_i'~~ 1]^T\), the two coordinates of the same keypoint of a scene in two different images.

A homography \(H\) is a map in \(\mathbb{R}^{3 \times 3}\) that given some image coordinates, will:

The homography \(H\) has eight degrees of freedom (as it can be solved up to a scaling factor): \[H = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1\end{bmatrix}\]

and applying \(H\) to a keypoint coordinate in the first image (using homogeneous coordinates) yields the corresponding keypoint coordinate in the second image:

\[w \begin{bmatrix}x_i' \\ y_i' \\ 1\end{bmatrix} = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1\end{bmatrix}\begin{bmatrix}x_i \\ y_i \\ 1\end{bmatrix} = H \begin{bmatrix}x_i \\ y_i \\ 1\end{bmatrix}\]

where \(w\) is a scaling value which is left unknown due to the fact that we do not know the true depth of the keypoint.

We want \(H\) to satisfy all \(n\) point correspondences:

\[ \begin{align} wp_1' &= Hp_1 \\ &\vdots \\ wp_n' &= Hp_n \\ \end{align} \]

and we can rewrite these relationships as the following matrix equation: \[ Ah = \begin{bmatrix} x_{1} & y_{1} & 1 & 0 & 0 & 0 & -x_{1} x_{1}' & -y_{1} x_{1}' \\ 0 & 0 & 0 & x_{1} & y_{1} & 1 & -x_{1} y_{1}' & -y_{1} y_{1}' \\ & & & & \vdots & & & \\ x_{n} & y_{n} & 1 & 0 & 0 & 0 & -x_{n} x_{n}' & -y_{n} x_{n}' \\ 0 & 0 & 0 & x_{n} & y_{n} & 1 & -x_{n} y_{n}' & -y_{n} y_{n}' \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix}= \begin{bmatrix} x_{1}' \\ y_{1}' \\ \vdots \\ x_{n}' \\ y_{n}' \end{bmatrix} = b \]

for \(A \in \mathbb{R}^{2n \times 8}, ~ h, b \in \mathbb{R}^8\). We note that for \(n=4\) correspondences there exists a unique solution homography but we will often use greater than four correspondences for stability, leaving an overdetermined system of equations. However, because we are students at Berkeley we are now salivating at the opportunity to deploy the most useful tool in an engineer's toolbox, ordinary least squares, to obtain an optimal homography \(h^*\):

\[h^* = \min_{h} \|Ah - b\|_2^2\]

Warping

To warp images using a homography \(H\), we use inverse warping and interpolaton instead of forward warping, which allows us to sample an interpolated pixel value in the source image for each pixel location in the warped image. This process is described in Project 3.

Rectification

We can rectify images by providing four vertices of a rectangular object in an image, even if it does not appear rectangular, and applying our method where the destination coordinates are the four corners of an image. This produces the following results:

Mosaics

We can now create panoramas or mosaics by manually defining correspondences between two images. We do so via matplotlib's ginput method, which allows us to click points by hand. To counteract lighting changes that would cause borders in between images, we use alpha blending via masks to smoothly transition between the two images at their overlaps.


We present the following panoramas of Hearst Mining Circle, Doe Library, and Soda Hall Breezeway, because everyone misses campus.







Part 2: Automatic Feature Matching & Robust Homography Computation

Instead of manually clicking on correspondences, we can automate correspondence matching, using a method inspired by Multi-Image Matching using Multi-Scale Oriented Patches by Brown et al.

Harris Corner Detection & Adaptive Non-Maximal Suppression

We use a provided implementation of the Harris corner detector, which identifies corners in images from their gradient directions. However, many corners are surfaced by this detector and it is not computationally feasible to use all of them in matching and homography computation. We opt to filter our retrieved corners by Adaptive Non-Maximal Suppression (ANMS), which in short selects the top \(n\) points ranked by the suppression radius \(r_i\) in which there do not exist any other points with a corner intensity that exceeds that of the current point by a constant factor \(c_{robust}\):

\[r_i = \min_{j} | x_i - x_j |, \text{s.t. }f(x_i) < c_{\text{robust}} f(x_j), x_j \in \mathcal{I}\]

We display the results of ANMS on the detected corners in our Hearst Mining Circle images, using ANMS with \(c_{robust} = 0.6\) and \(n=500\):



We are now left with sparse, more prominent corners, with a lot of "corners" in the sky, probably because noise is present.

Feature Extraction and Matching

For each corner we extract a feature descriptor, which is a \(40 \times 40\) pixel crop centered at the corner, which is then downsampled to a \(8 \times 8\) pixel and normalized (\(0\) mean, \(1\) standard deviation). Then, for each descriptor, we unroll it into a vector \(v \in \mathbb{R}^64\) and compute pairwise \(L_2\) norms between all descriptors across images.

The paper notes that naively matching each feature to its nearest neighbor in the other image is unreliable because it is difficult to find a consistent threshold- corresponding feature pairs' \(L_2\) distance will depend on lighting, color, and other factors. We opt for the Lowe ratio test, which scores each feature by its ratio of top 2 nearest neighbor distances:

\[r = \frac{\| v - v_{1NN}\|_2^2}{\|v - v_{2NN}\|_2^2}\]

and discards any pairs of points that do not hit a certain threshold. The logic is that prominent features that are useful as correspondences should have one and only one match in the other image- if they appear too visually similar to multiple features, there's a good chance that any match will be incorrect (the "Russian grandmother's advice" from Professor Efros).

We also enforce cyclic matching: if a feature in image 1 is closest to a feature in image 2, that feature in image 2 must in turn be closest to the original feature in image 1 (the "mutual best friend condition" from Professor Efros).

Robust Homography Computation

To better guarantee a good homography, we use RANdom SAmple Consensus (RANSAC). We select four points at random and compute a homography \(H_i\), and then count the number of inliers, points whose coordinates match closely to their correspondence in the other image:

\[\|v_{i}' - H v_{i}\|_2 < \epsilon\]

We compute \(H\) from the largest set of inliers and use this to warp our images together.

Automatic Panoramas

(Note: I took these photos before Professor Efros said to watch out about parallax effects, and I'm pretty sure they messed with my automatic panoramas. Additional discussion below. Please don't take too many points off I swear I implemented the methods correctly ty)

We compare the panoramas with manual correspondences (top) against those with automatically-generated correspondences (bottom).







What have I learned?

I learned about the sensitivity of homography computation to very small deviations in correspondences, as noted by numerous failure cases and by some artifacts in my automatic panoramas. For example, see Doe's "University Library" text in the automatic case. I suspect that my automatic panoramas didn't turn out as well because of parallax effects as discussed by Professor Efros in lecture. He noted that parallax effects would not be a problem on planar surfaces. When we compare the correspondences I manually selected in the Doe Library images (top) against those selected by the automatic method (bottom), we see that my correspondences mainly stuck to planar surfaces on the library's facade and doorway, while the automatic method selected a hodgepodge of points scattered throughout the image that did not lie on a consistent plane.