CS 194-26: Intro to Computer Vision and Computational Photography, Fall 2021

Project 4A: Image Warping and Mosaicing

By Xinyun Cao


Overview

In the first part of this project, we used homographies to combine images into paronamas.

A-1 Shoot the Pictures

First, I shoot the pictures following the guidelines. I shot three sets. First a faraway scene(the scene on the top of my roof), then a medium-distanced scene(in my back yard), then a set of relatively close shots in dim lighting(in my bedroom at night). The original images are shown below.

Note: I forgot to turn on the AE/AF lock when taking the faraway scene picture. So the exposure were a bit different among the pictures.

Result Pictures

s1_1
scene 1, pic 1
s1_2
scene 1, pic 2
s1_3
scene 1, pic 3
s2_1
scene 2, pic 1
s2_2
scene 2, pic 2
s2_3
scene 2, pic 3
s3_1
scene 3, pic 1
s3_2
scene 3, pic 2
s3_3
scene 3, pic 3

A-2 Recover Homographies

In this part, we manually selected corresponding keypoints between pictures, and then calculated homographies to project onw picture to the other using the formulas taught in class.

Algorithm Explain

First, we used plt.ginput similar to Project 2 and Project 3. I used 8 keypoints for each pair of pictures, and this value is set up as a variable that can be changed for later usages.

To warp an image from, say, projection defined by keypoint array arr2 to a new projection defined by keypoint array arr1, we first want to calculate the homography from the destination shape arr1 to the starting shape arr2 (note: because we are using inverse warping).

Specifically, to calculate the homography, we take the formula we have from class:

hom_1
Source: Lecture slides

We can then break the matrix down, and stack the information from all n = 8 points together to form the new matrix function:

hom_2
Reference: https://towardsdatascience.com/estimating-a-homography-matrix-522c70ec4b2c

The above formula can be rewritten as A*h = b, while h can be rewritten as the 3x3 homography by copying the first 8 values in order and adding 1 as H3_3. So we can just construct A and b using the above formula and the keypoints we selected in the previous step, and use np.linalg.lstsq to calculate the H value.

A-3 Warp the Images

In this part, we warp the images to a certain position in a larger image that will form our final image

Algorithm explain

To warp the image, since we are using inverse warping, we need to apply the homography we calculated in previous part to points in the destination image, and then choose the best pixel value based on the value around the result point in the original source image. I used cv2.remap function here. For syntax and usage examples of this function, I referenced the documentation of this function: https://docs.opencv.org/3.0-beta/modules/imgproc/doc/geometric_transformations.html?highlight=remap#cv2.remap, as well as this thread: https://stackoverflow.com/questions/46520123/how-do-i-use-opencvs-remap-function. I did write all the code myself.

After warping the image to the new position, we can see below that the position of images are correct, but we would like the images to shift to the right a bit, since the images are larger after the projection transform, and we would like a larger image. As shown below, most of the leftmost image is actually cut off, and we don't want that.

warp1
scene1, pic1, after warping but no shifting
warp2
scene1, pic2, after warping but no shifting
warp3
scene1, pic3, after warping but no shifting

The solution is to add a dstSize and shift variables to the warpImage function. We make the result image dstSize so that we can fit all the stretched images in, and we can shift ALL the images by shift amount to fit them in the new window. To shift the image, we can just subtract the shift amount to all index_x and index_y elements.

The result look like following:

newWarp1
scene 1, pic 1, warped and shifted
warp2_1
scene 2, pic 1, warped and shifted
warp3_1
scene 3, pic 1, warped and shifted
newWarp2
scene 1, pic 2, warped and shifted
warp2_2
scene 2, pic 2, warped and shifted
warp3_2
scene 3, pic 2, warped and shifted
newWarp3
scene 1, pic 3, warped and shifted
warp2_3
scene 2, pic 3, warped and shifted
warp3_3
scene 3, pic 3, warped and shifted

A-4 Image Rectification

In this part, we try to "rectify" an image. Basically, we will wroite a program such that after choosing a plane in any image, we can use homography and warping to create a new image with that plane as frontal and in the middle, so that we can create the effect that we are looking at that surface directly from the front.

Original image and plane chosen:

rec0
swag from my company. I chose the book as the surface.
rec_out0
rectified using the book
rec1
a photo I took in MIT. I chose a side of a structure as the surface.
rec_out0
rectified using the structure
rec2
an indoor garden I visited in Boston, I chose the central tile as the surface.
rec_out2
rectified using the central tile
rec3
a picture I took in a Boston museum. I chose the sheet of music as the surface.
rec_out3
rectified using the sheet of music

A-5 Blend the images into a mosaic

In this part, I blended the warped images in previous parts together to form panoramas.

Algorithm Explain

As pointed out in lectures, simple piling up of the images could lead to an obvious edge in the panorama, which is obviously not ideal. I used a weighted average method using an alpha mask to achieve a smoother transition.

First, we make the three kinds of alpha mask in the original image space. I used np.linspace and np.tile functions to achieve linear fallback from the middle of the mask to the side of the mask.

In this part of the project, all the images I made are stiched together horizontally and span less than 180 degrees. So we only need 3 alpha mask. One for image on the left most, which only need fade out on the right side, One for the image on the right most, and one in the middle, with fade on both sides. The masks look like:

mask_left
mask_left
mask_left
mask_mid
mask_left
mask_right

I then warped the masks using the same way we warped images in the previous part, and it looks like this. Note: we can also apply the mask and warp the images. But at this point I have the warped images already so I just went along with it.

warp_mask_left
warp_mask_left
warp_mask_mid
warp_mask_mid
warp_mask_right
warp_mask_right

Last but definitely not least, we element-wise multiply each warped image to their corresponding warped alpha masks, and adding them together to form the final image! Results shown below:

alpha_result_1
alpha_result_1
alpha_result_2
alpha_result_2
alpha_result_3
alpha_result_3

Note: In the first result, there were some inconsistencies. Since they don't appear in the second and third image, I think it is because I didn't lock the exposure when taking the first scene, causing different images to have inconsistent brightness. It could also be because of the blending method.

PartA: What I learned

In this part of the project, I learned how to calculate homography and how to warp images using the homography to a different projection. This is very exciting. I managed to produce ok looking panoramas with the relative naive solution: manual point selection, no calibration and alpha channel general blending. I hope to improve this in Part B, to make it automatic and robust against error. I am also VERY interested in taking 360 panoramas and form a cylinder or even a ball, and export it to view in Virtual Reality. I think it is going to be very cool and I hope I can finish that for Bells and Whistles.

PartB: FEATURE MATCHING for AUTOSTITCHING

In this second part of the project, we automated the homography calculation process using feature destection and RANSAC algorithm.

PartB-1: Detecting corner features in an image

I used the provided skeleton code to calculate the Harris Corners. I set the min_distance to be 5 to reduce the amount of detected corners.

Result:

harris_corner
harris corner detection overlayed on the image

After that, I used Adaptive Non-Maximal Suppression to choose the 150 points with the largest supression radius. That means those points are local maximum in the largest radius possible. I used the provided dist2 function to calculate the distance between each pair of detected coordinates, and assigned the radius based on the nearest neighbor that is larger than this point. We then simply find the first 150 points with the largest supression radius. To make the algorithm more robust, I also used c_robust = 0.9 to make sure the neighbor is definitely larger than the point.

Result of Adaptive Non-Maximal Suppression:

anms_points
corners after ANMS overlayed on the image

PartB-2: Extracting a Feature Descriptor for each feature point

Then, we try to extract a feature descriptor for each feature point result from the ANMS algorithm. To do that, I cropped out the 40 by 40 area next to every feature descriptor, then I blurred those 40 by 40 patches using gaussian filter. I then subsampled in that area to get the 8 by 8 feature descriptor.

Example of feature descriptor:

fourtyByFourty
A blurred 40 by 40 feature descriptor
eightByEight
A 8 by 8 feature descriptor after subdampling the 40 by 40 patch

PartB-3: Matching these feature descriptors between two images

In section 3 of Part B, we want to match the feature descriptors we got in the previous sections between different images. To do that, I first calculated the SSD(Sum of Square Differences) between every pair of feature descriptors of the two images. For each descriptor of image 1, I choose the best and second best matching feature in image 2. If err(best) / err(second) is larger than the threshold, that means the two potential choices are too similar, and we reject both according to Lowe theory. Then I do the same thing on image 2 to get a list of image 2's best match.

After that, following symmetry principle, I picked all the pair of feature descriptors such that they are each other's top choice. This way, we can make sure there's minimum amount of error.

After studying the paper, I chose 0.6 for the threshold, as wrong matchings start to grow exponentially after this value.

Result of Feature Matching:

The red dots are feature points after using ANMS algorithm, and the blue points are the feature points chosen after running feature matching.

feature_matching1
img 1 in the feature descriptor matching
feature_matching2
img 2 in the feature descriptor matching

PartB-4: Use a robust method (RANSAC) to compute a homography

In this section of Part B, I used RANSAC algorithm to choose the best homoraphy calculated from the points chosen in the feature matching algorithm.

In each RANSAC round, I chose 4 points at random, then I calculate the homography using these four points. Then, for each point in the starting image, we calculate the difference (using SSD as error) between the dot product of the new homography and the starting position, and the location of the corresponding point in the destination image. If the distance is smaller than a certain threshold, then we classify this point as a "inlier" for this specific 4 points.

I ran the RANSAC round 1000 times and chose the largest inlier set among all the rounds, and calculated the final homography using the largest inlier set.

Result of RANSAC:

The red dots are feature points after using ANMS algorithm, and the blue points are the feature points chosen after running feature matching. And then the yellow dots are remaining points after running RANSAC on the blue dots.

ransac1
img 1 after running RANSAC
ransac2
img 2 after running RANSAC

PartB-5: Produce new mosaics

Using the auto homography calculation in Part B and the mosaic method in part A, I produced new mosaics. They are shown below, along with the old images produced by manual stitching.

Result:

alpha_result_1
scene 1, manual
auto_scene1
scene 1, auto
alpha_result_2
scene 2, manual
auto_scene2
scene 2, auto
alpha_result_3
scene 3, manual
auto_scene3
scene 3, auto

PartB-6: What I learned in Part B

First of all, don't procastinate TvT

I think auto feature matching is really cool! In part A, my least favorite part is marking out all the feature points. It was very tedius, and one single mistake can mess up the whole result. I really love how this new auto stitching program can be applied to all the different images and that I nenver need to manually mark it out again.

This also makes me imagine other possibilities. In Project 3, I really hated marking out human facial features by hand. I wonder if we can use similar feature matching algorithms to extract and match human facial features. I think that would be a very efficient and interesting project.