Project 4A: Image Warping and Mosaicing

Timothy Kha


1.1: Shoot the Pictures

For this project, I took pictures of objects that could be used for rectification. I took a picture of a text book at a slanted angle so that we could rectify it to see the top image. I took a picture of a walgreen reward card that I would also rectify to view it from a better perspective. I also used an image of an art gallery from the internet.

Images for Rectification:

Chinese Textbook
Walgreens Reward Card
Art Gallery


I also took pictures for my mosaics. I took a lot pictures of outdoor scenes such as our campus. For these images, I took 2+ images of each scene.

Images for Mosaics:

Forrest Left
Forrest Right


Campus Left
Campus Middle
Campus Right


Backyard Left
Backyard Middle 1
Backyard Middle 2
Backyard Right

1.2: Recover Homographies

I first manually selected points by hand for each image (usually 8-10).

Tree Left w/ Labeled Points
Tree Right w/ Labeled Points


Following the least squares approach as described in lecture, we have the equation p' = Hp.

We then can rewrite the coefficients of H as a vector h where h = [a,b,c,d,e,f,g,h].T. We only have 8 unkowns since we set the scale factor = 1.

We can now set up a system of equations in the form Ah = b.

We know that:

wx' = ax + by + c*1
wy' = dx + ey + f*1
w = gx + hy + 1

Substituting w into the first and second equation:

(gx + hy + 1)*x' = ax + by + c
gxx' + hyx' + x' = ax + by + c
x' = ax + by + c - gxx' + hyx'

(gx + hy + 1)*y' = dx + ey + f
gxy' + hyy' + y' = dx + ey + f
y' = dx + ey + f - gxy' - hyy'

Thus we can construct a matrix as such and solve the overdetermined system using least squares (np.linalg.lstsq) and achieve our H matrix:

Source: https://towardsdatascience.com/estimating-a-homography-matrix-522c70ec4b2c


1.3: Warp the Images

Now that we have a Homography matrix, we need to do warping.

For my approach, I chose to warp the keypoints of the left image into the plane of the keypoints of the right image.

First, I created a bounding box by finding the coordinates of the 4 corners of the left image after forward warping (H * corners). Once I had these values, I used meshgrid to make a canvas that would fit the warped image and proceeded to inverse warp. Following the approach in proj 2, I performed inverse warping for the left image. With my meshgrid, I filled in the output image with interpolated values using cv.remap.

The right image is placed correctly by finding the offset between the correspondent keypoints and shifting the right image accordingly to the right location on the canvas.
Left Tree Warped
Right Tree Warped/Shifted

1.4: Image Rectification

With our Homography matrix that we computed from the previous section, all we need to do is warp our image to a new plane of our choosing. After chosing 4 points in the source image (typically the 4 corners of the object I'm trying to rectify like the textbook), I chose 4 points representing the shape I want the source image to be warped to (in my cases, I just chose the shape of a rectangle).

Here are my results:

Textbook Source Img
Textbook Rectified
Walgreens Card Source Img
Walgreens Card Rectified
Painting Source Img
Painting Rectified

1.5: Blend the images into a mosaic

2-Image Mosaic

First I tried to make masks to blend the middles 50/50. I started off by creating masks for my two warped images (right and left).

Left Tree Warped
Right Tree Warped
Tree Left Img Mask
Tree Right Img Mask


Next I split my masks into 3 regions: mask of pixels only from the left image, mask of pixels only from the right image, mask of pixels that intersect both images.

Tree Left Mask
Tree Intersection Mask
Tree Right Mask
Tree Left Mask
Tree Intersection Mask
Tree Right Mask


Here is what the three combined together looks like together with our averaged middle section:

Tree Averaged Intersection


Doesn't look good so instead, take the maximum of each pixel in the intersecting region. In the regions that don't overlap, simply take the points from the original images.

Result:
Tree Maximum of Intersection


That looks better! There's less blurring and noticeable edges with this appraoch, so I chose to use the maximum pixel blending approach for the rest of the images because it seems to perform better than the average blending.

Tree Cropped

N-Image Mosaic

Rather than having this only work for 2 images, I needed to extend this to allow for an arbitrary number of images.

I achieved this by using the one at a time approach as described in the spec. I leave the right image unwarped, and then warp the left image into its projection. I repeat this from left to right iteratively until all images are warped and blended together.

Left side of campus
Middle of campus
Right side of campus


First I merged the left picture with the middle:

Left joined with middle


And then finally the left-middle merged picture with the right:

Left and middle joined with right


3 images Cropped result:



For my final example, I merged 4 images that I took of my backyard.

Left
Middle 1
Middle 2
Right


Here is the merging process:

Left merged with Middle 1
Left, Middle 1 merged with Middle 2
Combined merge


Cropped result of the 4 images:

4 Backyard's Cropped

1.6: What I've learned

The coolest thing I learned is how the making of panoramas on our phones works under the hood. Working through this project, it was definitely amazing to learn about this process because of all the parts that had to be accounted for (shifting after warping, seamlessly blending the two images at the intersection, aligning the images correctly). After making these mosaics myself, I learned that it's important to be steady and consistent when taking pictures for the best quality.

Part B: Feature Matching for Autostitching

Step 1: Harris Interest Point Detector

Using the harris point detector starter code, we found points that could be potentially used for correspondences. I set the min spacing between each point to be 25 pixels.

Here are the results:

Left View of Street
Right View of Street
Left View of Street with Harris Corners
Right View of Street with Harris Corners

Step 2: Adaptive Non-Maximal Suppression

The next step was to use a technique called ANMS to further filter out our points. ANMS filters the number of points to a specified amount (in my case 500 points) and tries to space out the points throughout the image. As described in the paper, we want to find the minimum supression radius of each point which is given by:

Where c_robust = 0.9 in this case. After computing the min suppresion radius for each point, we sort and then select the 500 points with the highest radii.

Here are the results:

Left Street Corners with ANMS
Right Street Corners with ANMS

Step 3: Feature Descriptor extraction and Matching

After we have our 500 points, we need to filter this further by extracting feature descriptors and matching them. For each point in our set, we select a 40x40 window around each point, blurred the window, and then downsample this into an 8x8 square. I made sure to set the mean to 0 and standard deviation to 1 for each vector.

Now that we have descriptors for each point, we proceed with matching. For each point in image A, I calculated the squared error of the descriptor for point_a to the descriptors of every point in image B. I considered this point in image B with the lowest squared error to be the 1-NN (1 nearest neighbor). I also calculated the 2-NN (the point in image B with the second lowest squared error). Using these two errors, I calulated the ratio between the error(1-NN)/error(2-NN).

I only selected pairs of points if this error ratio was less than some epsilon (I used 0.675 in my case). This feature descriptor matching allows us to reduce the number of outliers which may also reduce the iterations needed for RANSAC.
Left Street feature-matched
Right Street feature-matched

Step 4: RANSAC

Finally, to further reduce any outliers and ensure that we have a robust set of points, we use the RANSAC algorithm as described in the paper and lecture:

1. Select four feature pairs (at random)
2. Compute homography H (exact)
3. Compute inliers where dist(pi’, H pi) < ε
4. Keep largest set of inliers

I set ε equal to 2 pixels in this case. After 1000 iterations, we are left with the homography that left us with the largest set of inliners.

Here are what my correspondence points after RANSAC looks like:
Left Street RANSAC
Right Street RANSAC


Here are some other examples of mosaics I made with the points selected after RANSAC:

Left Backyard RANSAC
Right Backyard RANSAC


Left Painting RANSAC
Right Painting RANSAC


I wanted to see the differences between a mosiac that I made manually selected points for and one that was created using our automated point selection so I used the tree images from part A:

Left Tree RANSAC
Right Tree RANSAC

Mosaics

In addition to automating the process of point selection, I also implemented lapacian pyramid blending in order to have better stitching of my images. This was achieved by creating a mask that slowly fades in the intersection of the 2 images. Then I proceeded to using lapacian pyramid blending as in Project 2 to achieve my mosaics.

Here is an example mask for one of my images that I used for lapacian pyramid blending:

Mask with fading intersection for lapacian pyramid


And here are what my results look like!

Street Mosaic:




Backyard Mosaic:




Painting Mosaic:




Tree Mosaic:




Comparing the automated mosaic with the manual one:

Tree mosiac automatic
Tree mosaic manual


The results are very comparable! In fact, in addition to using laplacian blending, the mosaic created with automatic point selection looks quite a bit cleaner in the intersection and I think the results look great.

What have I learned?

I think the coolest things I've learned from this project are the techniques and algorithms that we used to completely automate the selection of correspondence points between images. Seeing the filtering of points from one step to the next into usefule points of correspondence (from ANMS to Feature matching to RANSAC) was super interesting to see. The fact that we're able to start with simple objects like corners and identify actual objects was really cool and I was very impressed to see how great the results were from the automated mosaics in comparison to the original hand-picked ones.