CS 194-26: Fall 2020
[Auto]Stitching Photo Mosaics
Shoot the Photos
In this project, we want to stitch together photos to create a paranoramic style photo. Thus, I shot two photos of my living room from the same point of view but from different view directions, and with overlapping fields of view, such that the transforms between them will be projective.
We are able to relate two images with the same camera center by casting a ray through each pixel in the first image
and draw the pixel where the ray intersects in the second image. We warp these pixels by applying a homography.
Formulas are shown below. I used 8 points (we need to use a minimum of 4, but each additional point makes
the homography more accurate), and solved the correspondences with SVD. I initially used least squares, but found
SVD to be more effective.
To verify my warp function is working, let's try rectifying a few images. I tried rectifying a few flat objects that
I took from an angle into a frontal-parallel view.
My warp function works pretty well!
Blend the images into a mosaic
Finally, we get to stitch together the mosaic. After I warped my images, I create a final image space with lots of
padding so that the mosaic doesn't get cropped out. Then, I added both the first image and the warped image to the
blank final image. In order to stitch together the two images, I first tried directly adding them, which did not work out
well because it manipulated the colors in the overlapping region. Then, I tried a mask and splicing method, which worked
out alright but produced some strange edge artifacts. Finally, I tried taking the maximum of both images, which produced
the best results.
The entire process is visualized below.
Warped Images (second image warped)
What I learned: Part A
This project was really cool. I found it really interesting that we are able to warp one image into another image's plane, recovering homographies, without knowing any additional information aside from the two images at hand. I was also surprised by how accurate this is. I experience weird warping effects when I try to rectify (and projective warp) some images I've taken on my iPhone, but in this project I was able to almost perfectly rectify flat objects. Since I manually went in and selected the corner points of the object, the warp was near perfect. There's so much from historical images that we're able to recover within seconds, that took some researchers years in the past! Every project in this class makes me appreciate what technology can do for us.
Harris Corner Detection
In the first part of this project, I manually labeled the interest points for warping. In this part of the project, we want
to automate this process. I detected the corner points using the Harris corners algorithm. Below are the images visualized
with their harris corner points.
Adaptive Non-Maximal Suppression
As you can see, we ended up with a ton of points, many of which are clustered together. The cost of matching so many points is superlinear, so we want to restrict the total number of interest points. In addition, it's important that interest points are spatially well distributed over the image. In order to meet these requirements, we implemented a adaptive non-maximal suppression strategy that is described in this paper to select 500 fixed interest points. Below are the images visualized with the selected ANMS points.
Feature Descriptor Extraction & Matching
Now, we need to match the correspondence points across the images. To do this, we essentially took patches around the corner
points of the left image, and patches around the corner points of the right image, and compared them. We sampled the patches from a
larger 40x40 window, applied a Gaussian blur, and then down-sampled it to be an 8x8 patch. These patches were normalized to a mean of 0
and a standard deviation of 1. A few patches are visualized below.
To compare them, I computed the SSD betweeen the two patches, and took the patch with the lowest distance as the corresponding point. In order to accomodate for outliers, I compared the distance to the closest feature (d1) to the distance to the second closest feature (d2). If the ratio d1/d2 is below a certain threshold (I chose 0.475 in this instance), we keep the correspondence. Otherwise, we discard the correspondence. Finally, we have our matched points, visualized below.
RANSAC + Mosaic-ing
Now that we have our points ready, we need to find the homography to warp them. The RANSAC algorithm attempts to find the best homography
between the two images. More specifically, it brute force searches through subsets of 4 feature pairs (the minimum number of points
needed to compute a homography), computes a homography, and computes inliers for the homography. We keep the largest set of inliers,
and recompute the least-squares homography estimate on all of the inliers.
The parameters I chose are as follows:
Now that we have our homography and interest points, we can finally warp the images together. I followed the same method as I did in Part A. Results are shown below (left: Automatic Keypoints, right: Manual Keypoints)
Honestly, I'm not too satisfied with my results. I played around with the error and outliers, but ultimately I think I chose very confusing images for my algorithm. I get clean lines for the objects in the overlap, but the abundance bushy trees with tons of corner points for the leaves seems to be throwing my algorithm off. In addition, since I was outdoors, the lighting was slightly different between the photos and caused a sharper line in my image blend.
What I learned: Part B
I really enjoyed this part of the project. I'm consistently impressing by what we can do with just linear algebra. After training a model for hours in the last project, I'm pretty amazed by the results of this project and how it did not require any training - just math!