Hi! This project explores using homographies to stitch images together.

  • Part 1: Shoot and Digitize ImagesĀ¶

  • Part 2: Recover Homographies

  • Part 3: Warp the Images

  • Part 4: Blend the Images into a Mosaic

Part 1: Shoot and Digitize ImagesĀ¶

The first task was to shoot images that have overlapping correspondences. I chose to use a DSLR camera and shoot images with some friends in the class. We shot photos around campus-- in a library, by Heart Mining Circle, and by the campanile. As it got darker, the images were more difficult to shoot, but we made it work by adjusting the aperture and exposure.

To initially develop my code, I used the following images of Hearst Mining Circle.

Part 2: Recover Homographies

From there, I needed to recover the 3x3 transformations between the two images. This would transform coordinates in the first image to the plane of the second. The 3x3 transformation matrix applies a homography, such that p' = Hp, as shown below. There are 8 unknowns within the equation, so 4 points is enough to specify the homography. However, because we're specifying the correspondences by hand, these homographies are prone to error. Instead, we formulate the problem using Least Squares, and solve for the H matrix values that minimizes error.

We can write out these equations and rearrange them such that all our unknowns are in an 8x1 array. The equations below illustrate the equation for 1 pair of corresponding points. For more, we would simply vertically stack a repetition of the b and A arrays.

I tested whether the homographies worked by transforming the correspondence points using my calculated homography matrix. We can see that the points overlay eachother-- not perfectly, but well enough.

Part 3: Warp the Images

Next, I wrote a function to warp all the points of one image into the plane of another. The function takes in the two images and the homography between them. It first applies a calculated pad that is found by transforming just the corners of the second image using H's inverse. From there, I know exactly where the second image will land in the plane of the first image. I can pad the first image accordingly-- although because of rotations in the image, this pad results in some of the warped image being clipped sometimes.

The warp function uses cv2.remap, looking up values of the new warped image from the second image. First, I tested this on some images that had a known rectangular shape when viewed from the front. This allows me to check that my warp function works as intended before testing it on two images in arbitrary planes. The results work fairly well, although there's some details that look a bit odd. For example, from the stack of books, we can see the sides of the stack of books that were visible in the first image, even though we wouldn't expect to be able to see these if we were truly viewing from the top. The reason for that is that truly viewing the book from the top requires us to move our viewing point, which is impossible after we've captured an image.

Part 4: Blend the Images into a Mosaic

Finally, I took two images, warping one to the plane of the other. I used an alpha mask to stitch the two images together, which sets the overlap of the two images to 0.5 and the unique parts of their masks to 1. This allows me to multiply each image -- the original padded one and the new warped one-- by the alpha mask, taking a average of values for just the overlapping region.

The results are shown below! We see that there is quite a bit of blur within the overlapping regions, which is attributed to error in selecting correspondence points. This gets worse for the darker images, which already have an inherent blur that makes it difficult to classify the points. In the next part, I'll improve on this by automatically detecting correspondences.

The most difficult part of this project was working through how to index and manipulate large images efficiently. This was my first time using DSLR images without downsizing the images first, so it took quite a bit of patience to get code running quickly. The coolest thing I learned fromm this project so far is definitely how to conduct perspective transforms.