Zachary Wu

CS 194 Project 4.1: Image Warping and Mosaics
Zachary Wu

Shooting the Pictures

Using modern technology, shooting images is incredibly easy. Using my phones built-in camera, and a tripod, I captured multiple pairs of images of my kitchen, Safeway, and stairs within MLK. The goal is for each of these images to have perspective transformations between them. In this case, this is achieved with a tripod to maintain the same point of view, but different view directions.

Recover Homographies

Going forward, I will use the image of my kitchen to showcase how the process works.

We will now try to compute the homography between the images. This tells us the perspective transformation that takes place the two images. We will compute a 3x3 matrix H, that when multiplied with one of the images, will align it with the other one.

In order to solve for the homographies, we first have to manually identify at least 4 points of correspondence. Below, I have manually selected some correspondences between the two images on sharp corners that both images share.

Using these points, we can set up a system of linear equations and do least squares to get the best H matrix that will result in a clean transformation that aligns the matching points. I used this math stack exchange answer, and this towards data science article to help learn about this process, and will summarize them below. Equations are from the article.

First, we start with a set of at least 4 pairs of points that correspond to before and after the transformation.

Our homography matrix H, multiplied by the original points will get us the final desired points. In order to solve for H, we have 8 unknowns (with h_33 being 1). Using 4 pairs of points allows us to get 8 equations. In this case though, we will use more points, and use least squares to have a better holography, as manual selection of points can be quite hit or miss.

Shreyans Sethi explained how to derive the equations on piazza, which I will quote here.
"""
Expand the original equation, we will have
ax + by + c = wx'
dx + ey + f = wy'
gx + hy + i = w

Now, because all the RHS's of the above equations share a w, you can sub in the value of w to get
ax + by + c = (gx + hy + i)x'
dx + ey + f = (gx + hy + i)y'

Expanding this out will get you:

ax + by + c = gxx' + hyx' + ix'
dx + ey + f = gxy' + hyy' + iy'
"""
This ultimately gives us a least squares problem that looks we can solve for the 8 unknowns for.

In the case of my kitchen picture, I receive the following resulting H matrix
[[ 6.31290874e-01, 6.41588866e-03, 1.30776077e+03],
[-1.26032049e-01, 8.83592102e-01, 1.27723401e+02],
[-9.77570199e-05, 9.27963080e-06, 1.00000000e+00]]

Warp the images

We can now use our calculated homography matrix to warp our second kitchen image to match up the correspondance points with the first matrix.

To do this, we take each possible index and multiply it by the Homography matrix (or inverse depending on which image we are changing. Then we use cv2 remap to interpolate the colors from the image into it's new positions, and this gets us the warped image.
We also begin with some padding such that after the perspective transformation, we still have all the pixels and the images are not cropped.

Image Rectification

One great thing about being able to warp images is we can rectify images to be as if we are viewing them from a different point of view. For example, consider the our kitchen image. What if we wanted to be able to read the things posted on the fridge head on?

We can simply use correspondences to transform it to be head on. First I define correspondences for the area of interest, and set it to become square in the perspective that we want.

The 4 corners of the fridge

A square head on perspective of viewing the fridge

Now using the same steps of computing homographies and warping the image, we get the following result.

Now we can read what is on the fridge head on! However, do note that the picture is a bit blurry, as rectifying the image does not allow for sharp images, as previous pixels need to be stretched, making certain parts appear blurry. You can't create pixels that aren't already captured in the original image.

Here's another example of rectifying an image in order to browse the breakfast isle at Safeway.

While image rectification is great for changing our perspective, it is not magic, and we cannot change our viewpoint to one in which we see things that the original picture does not capture.

Blending Image Mosaics

We now have two images of our kitchen, that are lined up, and we need to blend the two of them together.

The first blending method I will do is to to simply add the two images together. This will cause the overlapping portions to be incorrect. To correct this, we will create a mask for the overlap region, and in that area have .5 of each image.

The result is quite good! although there are some slight artifacts in the overlap region where we see some mismatch and a some subtle lines from the blending. The edges not matching up perfectly is most likely a result of the lens distortion on my camera making the image plane not perfectly flat.

Now, we will show 2 more examples of creating a mosaic.

For this stairs image, my camera lens was a bit dirty, resulting in a blurry photo, and I also did not use a tripod as it did not support verticle panning. These factors all influenced the sharpness of the end result. Even in the region that is not blended

What I Learned

For this part, the coolest thing I learned is being able to rectify images and create get an image from a new perspective. It allows an image to be created that was never taken, but you can still see if from a differnet perspective.