Project 5: [Auto]Stitching Photo Mosaics

Leanna Yu (cs194-26-aff)


PART A: Image Warping and Mosaicing

In this project, we first explore image warping with image mosaicing. With two images, we create an image mosaic. The process of doing so includes registering, project warping, resampling, and compositing the two images into one. To mosaic the images, we compute homographies, which are then used to warm the images, and finally putting the two together. This can be done with more than two photos for a better result.

Recover Homographies

To recover the parameters of the tranformation between each pair of images, we want to find the values in the 3x3 matrix, the homography, that transforms the first image's corresponding points to that on the second image. We want to find the homography matrix H, where p' = Hp expresses the transformation, p as the corresponding point x,y location on image 1, and p' as the corresponding point x,y location on image 2. Though there are 9 unknown values in the 3x3 homography matrix, given that the scaling factor is 1, we knock out one unknown and are left with 8, meaning we need at least 4 corresponding points on the two images, which would solve for the unknowns. But for the sake of noise in the data, if would be wise to have more than four data points, in which case, we solve for h, the 8 x 1 vector containing the 8 unknowns of H, using least-squares. In Python, we can do this with numpy.lstsq to find the solution.

Homography Equation

Image Rectification

We apply the warp function to the following images, naming some rectangular object to be the four points in image 1, and rectifying the image so that the plane is frontal-parallel. In the image in the art gallery, the four points were plotted on the four corners of the lady with sunglasses artwork. For the image of the tiles, the four points were plotted on the four corners of a tile. For the image of the Hollywood street sign, the four points were plotted on the four corners of the sign itself. All points were chosen manually, and coordinates were found with using Photoshop. All images were randomly found while searching Google Images.

Art Gallery

Art Rectified Art

Floor Tiles

Tile Rectified Tile

Hollywood Street Sign

Sign Rectified Sign

Blending Images into a Mosaic

Blending images into a mosaic follows steps similar to that of rectifying an image, with the additional step of blending. For image warping, for mosaicing, we want to find corresponding points in the two images that are to be stitched together. For the cul de sac images, I found some random points that formed rectangles and some other promenent objects in the images. From there, we warp one image into the perspective of another, and then stack the two, blending naively so that there are no harsh seams. When picking points, I did not give much thought to how many I picked, but had I picked fewer, the results might not have been as successful as they were.

Stairwell

Left stairs Right stairs
Blended stairs

Cul de Sac

Left street Right street
Blended street

Bed

Left bed Right bed
Blended bed

Desk

Left desk Right desk
Blended desk

What I Learned from Project 5 Part A

After learning what homography matrices were, they occurred to me to be the magic matrix that really makes all of this possible. It's still amazing to me that the iPhone camera is able to do all of this in real-time, seeing the amount of calculation it takes to stich together only two photos. Also with the last result of the bed mosaic, I realized that the amount of sunlight coming through the window caused the iPhone camera to adjust its settings, so the right photo is darker than the left, so the mosaic has a harsh line for this reason, even though the alignment worked pretty well for the most part.

PART B: Feature Matching for Autostitching

Now that we are able to stitch together images with hand-picked corresponding points between images, we explore how to have algorithms do the feature detection for us. This will allow for a much larger set of corresponding features, allowing us to have a better match when stitching the images together, as more corresponding points allows for a more accurate result, as compared to fewer. To do this, we will be following the Multi-Image Mathing paper found here, specifically in Sections 2-5.

Harris Interest Point Detector

The code for generating Harris Corners was provided in the project spec. Below is the result of running the Harris corner detection algorithm on left and right images for the stairwell mosaic. This algorithm generates a large number of points in its output, so we have to further process this result.

Stair left harris Stair right harris

Adaptive Non-Maximal Suppression

The first step to cutting down the number of points that we use as corresponding points between the two images is done by ANMS. ANMS works to suppress corners if there are corners surrounding it, within a certain radius distance, that are at least 10% greater in strength than itself, due to the c_robust value set at 0.9 as it was in the paper. The suppression radius for each point is computed and stored in the dictionary for distances. To choose which 500 points are to be kept, we sort the dictionary by radius distance, and admit the first 500 points, to be kept for the steps following.

Stair left ANMS Stair right ANMS

Feature Descriptor

To associate features across the two images, we first create patches that desribe features. These patches are 8x8 pixels, and are constructed using windows of 40x40 pixels, spaced with 5 pixels in between. Within each window, we put the patch through a Gaussian filter to blur it, so that we can downsample the pixels we are looking at. Each patch is then normalized to 0, and then added to the list of patches to be processed through Feature Matching.

Feature Matching

With the patches from the step above, we want to match the patches to one another in the two images, by pairing them up. To do so, we use the provided dist2 function as the strength metric for how similar patches are to each other. If the computed dist2 is low, that indicates that the patches are more similar to each other than when the computed dist2 is high. We then follow Lowe's thresholding ratio of the error for the best match over the error of the second best match to decide whether to keep the matched patches. This threshold is to control for outliers, which are features that match closely to multiple features in the other image, indicating instead that it might be a texture or some object that is in the images multiple times, that might do more harm than good if kept as corresponding features in the two images.

Stair left matched Stair right matched

RANSAC

RANSAC generates the final homography needed to perform the image warping for the mosaic generation. To do this, we compute homographies with four random points in the feature matched points output from the step above. From there, we compute inliers, again by calculating the dist2 of the point in the first image transformed by the proposed homography to where the actual point lies in the second image. If this distance is within 4 pixels in this case, then it is considered an inlier, and we want to keep groups of these to keep track of what the largest set of inliers is after a set number of iterations, in this case 10,000. The inlier set that is kept by the end of these iterations will determine the final homography.

Stair left final points Stair right final points

Results

Below are the results. Mosaics on the left were generated by code in Part A, with hand-selected point correspondences and images as input. Mosaics on the right were generated by the code in Part B, with the images as the only input.

Stairwell

Stair mosaic Stair auto mosaic

Desk

Desk mosaic Desk auto mosaic

What I Learned from Project 5

Although it is much less work on our part to have algorithms find corresponding points for us, the human eye may still be better at corresponding features because we have spatial reasoning and recognition when it comes to objects that are within images. That said, the automated mosaic generation does a better job overall, as it is able to process many points and narrow down the points to be used in a logical and consistent manner. I am still amazed that cameras these days can take panoramic photos in real time, creating realities that even the human eye cannot see, for example a 360 panorimic image greatly outdoes our human field of view at a given time.