Image Warping and Mosaicing

Perspective Projection

Panoramic images, a popular feature on your phone camera, are created by taking photos that overlap each other and stiching them together using projection.

Here we use perspective projection (or homographies) to stitch images together. We recover the homography by using least squares to find the homography that best maps the correspondence points of each image together.

The math:

We are trying to find $H$ such that $$H\textbf{p} = \textbf{p'}$$ where $\textbf{p}$ and $\textbf{p'}$ are pairs of correspondence points.

We fix $i$ to be $1$. Let $\textbf{h} = [a, b, c, d, e, f, g, h]^\intercal$. Then: $$\begin{align*} H \textbf{p} = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ 1 \end{bmatrix} &= \begin{bmatrix} wx_1' \\ wx_2' \\ w \end{bmatrix} = w\textbf{p'} \\ \end{align*}$$

We get the equations: $$\begin{align*} ax_1 + bx_2 + c &= wx_1' & & \dfrac{ax_1 + bx_2 + c}{gx_1 + hx_2 + 1} = x_1'& &ax_1 + bx_2 + c - gx_1x_1' - hx_2 x_1' = x_1'\\ dx_1 + ex_2 + f &= wx_2' & \implies & & \implies &\\ gx_1 + hx_2 + 1 &= w & & \dfrac{dx_1 + ex_2 + f}{gx_1 + hx_2 + 1} = x_2' && dx_1 + ex_2 + f - gx_1x_2 - hx_2 x_2' = x_2' \end{align*}$$

Thus, in our least squares problem, we are trying to minimize $$|| Ah - b||^2 $$ where $$A = \begin{bmatrix} p_{1, x_1} & p_{1, x_2} & 1 & 0 & 0 & 0 & - p_{1, x_1} p_{1, x_1}' & - p_{1, x_2} p_{1, x_1}' \\ 0 & 0 & 0 & p_{1, x_1} & p_{1, x_2} & 1 & -p_{1, x_1}p_{1, x_2}' & -p_{1, x_2}p_{1, x_2}' \\ \vdots & & & & & & & \vdots \\ p_{n, x_1} & p_{n, x_2} & 1 & 0 & 0 & 0 & - p_{n, x_1} p_{n, x_1}' & - p_{n, x_2} p_{n, x_1}' \\ 0 & 0 & 0 & p_{n, x_1} & p_{1, x_2} & 1 & -p_{n, x_1}p_{n, x_2}' & -p_{n, x_2}p_{n, x_2}' \\ \end{bmatrix}$$ and $$b = [p_{1, x_1}', p_{2, x_2}', \dots p_{n, x_1}', p_{n, x_2}']^\intercal$$

Though technically we only need 4 points to produce a unique homography, it desirable to have many pairs of points because rectilinear projection is not necessarily how these two images are related to each other, and defining correspondence points by hand can introduce a lot of error, which can be mitigated by having to fit to more points.

Rectifcation (aligning plane to plane)

Aligned laptop screen to frontal plane

Aligned front of vanity to frontal plane

Auto-Stitching

To improve on the panorama stitching, we introduce auto-stitching, which takes two images and computes a homography between them without needing to define correspondence points by hand.

The algorithm is simple. First, using a built-in library we detect Harris corners for each image.

Then, we use adaptive non-maximal suppression to keep a uniformly distributed subset of Harris corners over the image.

ANMS works by for each feature, determining the nearest neighbor that is sufficiently stronger (in this case, ~1.1x stronger). The points with further neraest neighbors are thus relatively strong locally. Points with closer nearest neighbors are not locally strong (maxima). We select the 500 points with the furthest nearest neighbors.

For each of these points, we extract an 8x8 feature descriptor with 5 pixels space per sample. We then compare the descriptors of the first image with the descriptors of the second image using the squared distance. Then for each point on the first image, we use Lowe's concept of removing all points whose nearest neighbor is not significantly closer than its second nearest neighbor. In our case, we use a cut-off 1-NN/2-NN ratio of 0.6.

Finally, for the remaining points, we use 4-point RANSAC to compute an estimated homography matrix. RANSAC works by randomly selecting 4 potential pairs of points from a feature point and its nearest neighbor and computing a homography with those 4 pairs of points. Then, we apply the homography matrix to all potential feature points of the second image, and find out which of them have an SSD to their potential match in image1 of less than a certain threshold (here, 10). We do this 1000 time, and choose the homography that produces the most inliers.

Here is 2 videos auto-stitched together of a kitchen. The videos were stitched frame by frame.

Unfortunately, the FOV is not expanded by that much (we get to see the kitchen window) due to not having a super wide space to film in. It's also kind of shaky since every frame uses a different homography. In hindsight, I could've used the same homography for every frame since the videos don't move much.

Panoramas

To create panoramas, we find homographies between each pair of image, and one by one project all the images onto the base image.

A family photo

This was produced via 5 image of some stuffed animals on my bed. The manual and auto seem to be of similar quality. The shaky images is probably from bad feature detection or the fact that I didn't quite take every photo at the same POV, which results in it being impossible to perfectly project.

Manual

Auto

Living room

It looks pretty good! I'm amazed how randomly choosing 4 points is good enough to make these nice images! It looks just as good as the manually stitched one (or maybe better).

Manual

Auto

A bedroom

This was produced with 3 images of a room. Also looks super nice.

Manual

Auto

Cylindrical Projections

We can also create panoramas by first projecting images to a cylinder, and then stitching them together.

We can project an image to a cylinder given the distance $Z$ from the imaging plane to the center of the cylinder and the focal length $f$. I used an iPhone XS to shoot these images, so I just chose $Z$ and $f$ so my projections look good.

We can transform from image coordinates $(x, y)$ to cylindrical coordinates via the following: $$\begin{align}\theta &= f \cdot \arctan\left(\dfrac{x}{Z}\right) \\ h &= f \cdot \dfrac{y}{\sqrt{x^2 + Z^2}}\\ \end{align} $$ plus minus some translations neccessary for all coordinates to be positive.

Then we can perform panorama stitching on the cylindrical images.

It looks a lot cuter with the rounded edges, and the stuff at the widest point of the FOV are not stretched as widely. Still, it does have its own artifacts (straight horizontal lines are now curved) but is probably better for images with wide FOVs.

Coolest thing I learned

Honestly, the coolest thing I learned is that randomly choosing 4 points in RANSAC is enough to make a really good homography. If I had more time, i would like to experiment with more types of projections and also speed up the auto stitching!