Image Warping and Mosaics

In this assignment, I explore the concept of the $3\times3$ homography matrix $H$. $H$ is used to map augmented points $p=[x,y,1]^T$ to $p'=[x',y',1]^T$. $p'=Hp$. This is useful for aligning two different images of a scene, which were taken at two different perspectives but the same location. Essentially, it can be useful in creating mosaics and panoramas. The matrix $H$ is a combined translation and rotation from one set of coordinates to another. It can be solved, given 4 or more "control points" $(x_i,y_i)$ in each image (points marking the same objects in the two different scenes), through least squares of the system:


$H_{3,3}=1$ to set a constant scaling factor. After recovering $H$ based on the control points, we can use it to warp one image to be in the same plane as another. Then the two in-plane images can be blended to create a mosaic.

Part 1 - Planar Rectification


Here, 4 control points are taken from the source image, and the encapsulated quadrilateral is warped using a homography to a regular rectangle. The result is that the normal of the in-image plane points to the audience. That is, the audience looks at the surface from the image face-on and front parallel.

Source Images Rectified Outputs

Here, we focus on the texture of the "railing."

Focusing on the side of the building.

Focusing on the giant glass window.

Same as above, but with more of the surrounding scene included

Part 2 - Mosaics

Rather than using a homography to warp to a rectangle, we can use homographies to create mosaics (panoramas)! In this case, we select 4 (or more) control points in two images of the same scene. After computing the homography between the points, we can warp one image to the spatial domain of the other. Then we can blend both the images together to create an output mosaic image. In this case, I used linear blending to combine the input images. The linear blending was used to combine the images so that there wouldn't be obvious seems where the images overlapped. For example, if I simply superimposed the right-hand-side image to the left, the audience would notice a seem where the image met due to slight variances in brightness and color between the two images.

Source Images Left Source Images Right Ouput Images

I learned a lot from this project. I found the act of converting a mathematical formula to computer code the most interesting. In mathematical theory, computing the homography seemed relatively simple. Set up $p'=Hp$, then do least squares to solve for the elements in the homography $H$. But when trying to implement that in code, there were several issues that I had to tackle. First, was making sure to double and triple check that all vectors were oriented correctly in Matlab, for the set up of the least squares system. Second, was figuring out how to "apply" $H$ once I computed the homography. It turned out that I need to get the pixels from the base image (right side), apply the homography $H$ to the augmented pixel coordinates, then sample bilinearly from the transforming image (left side). In the math, we want to transform left to right. But to create good images (with no holes or aliasing), we do an inverse tansform and sample. That was a great learning experience!