CS294-26

Image Warping and Mosaicing

Kumar Krishna Agrawal
UC Berkeley

Statement

Cameras capturet he three-dimensional world onto two-dimensional planes $P$. Under the assumption that there is no translation of the camera, let us consider two different views $P_1$ and $P_2$ of a scene. Typically, with reasonable overlap between the two views, we can define correspondences between points on the two planes, let's represent this mapping by $$x_i \in P_1 \to x'_i \in P_2$$ This is an example of projective transformation, where we can find some homography matrix $H$ such that $$x'_i = Hx_i \qquad H = \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33}\end{bmatrix}$$

As our homogeneous representation is scale invariant, finding $H$ reduces to estimating $8$ parameters. In this work we consider using Direct Linear Tranformations (DLT)-style algoritms. In particular we setup the following regression problem $$\begin{align} \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & - x'_1 x_1 & -x'_1 y _1 \\ 0 & 0 & 0 & x_1 & y_1 & 1 & - y'_1 x_1 & -y'_1 y_1 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ x_n & y_n & 1 & 0 & 0 & 0 & - x'_n x_n & -x'_n y_n\\ 0 & 0 & 0 & x_n & y_n & 1 & - y'_n x_n & -y'_n y_n \\ \end{bmatrix} \begin{bmatrix} h_{11} \\ h_{12} \\ h_{13} \\ h_{21} \\ h_{22} \\ h_{23} \\ h_{31} \\ h_{32} \end{bmatrix} = \begin{bmatrix} x_1 \\ y_1 \\ \vdots \\ x_n \\ y_n \end{bmatrix} \end{align} $$

Solving the above equation requires minimum $4$ points to estimate $H$. However, getting the correspondences is often inexact and noisy measurements. Therefore, using more than $4$ points would help improving the estimate of our homography. In this project we create mosaics from different images captured from single point, different planes by stitching these frames into single canvas.

Shooting Images

Several views are captured from iPhone 11 camera, at different times of the day. Some illustrative examples

Recovering Homography and Rectification

For recovering homography, we need quite precise correspondences between two images. Here we implement a small script to grab correspondence between sequence of images. Using this, we can warp images and rectify shapes in images according to prior knowledge.

On left original image from a corner perspective. On right we use the fact that the building is rectangular, and unwarp this image using a projective transform.

Generating a Mosaic

Warping images with projective transform allow us to perform something interesting : stitching images from same point, but different perspectives onto a single canvas. The high-level idea is outlined below

Assuming the camera center is fixed, and we rotate the camera, we can project our views onto a single plane by recovering pair-wise homography. Here, $H_{i\to i-1} $ is the homography matrix for projecting view $P_{i}$ onto plane for $P_{i-1}$. This operation can be chained to find the projection matrix for a shared coordinate system for all images. (Here we assume an odd number of images for mosaicing to have a well defined center)

Blending Images

Considering that we have multiple frames on a single canvas with significant overlap, blending is required to create a single mosaic image. For this part, we consider first evaluating the overlap between two frames, and use that to blend images. We use Gaussian filters on the generated masks to make the overlap seamless.

Putting it together

Using the tools sketched above, we're ready to generate some mosaics!

Dusk view overlooking the SF horizon.

Afternoon hike!

Afternoon view overlooking the SF horizon.

Most-Enjoyable

I really enjoyed getting a better understanding how the three-dimensional world around us can be captured on different geometries. In particular I loved how with the right mathematical tools (projective geometry) one can manipulate perceptual information to generate cool effects (mosaicing here).