I shot all of these images with iPhone 8 from the same viewpoint, using exposure and focus locking mode. To make sure consistency between different images, I use the tripod to help only move in one axis in the 3D space. After the photos are shot, to speed up the future processes, I resize the photos into 1/4 of the original size.
Here are all photos I shot, shown in pairs.
In order to get the proper transformation between two images, we need to recover the homography matrix between these two images. The way to calculate this matrix is to define a set of corresponding points in each of the images and calculate based on the following formula.
To be more specific, I construct the following matrix and build a least-square solution to get the 8 unknowns. Note that to be able to solve this equation, we need to at least define four points in each image. For the Image Rectification Task, I calculate the matrix using four points. For the Mosaic Task, I calculate the matrix using seven points.
After defining the points and calculating the homography, I use the inverse warping function to compute the warped images. Note that, if we directly set the warped image to be the same size as the original image, there might be a huge amount of information loss. To avoid that, I calculate the range of all possible warped points and shift the point which the smallest goes to 0.
Below are showcases of the Image Rectification, to prove that the warping function and the homography function works. Here, since we only have 1 image, I select corresponding points on the same image twice, with the second set estimating the locations of the frontal-parallel location.
After getting the one warped image, and one original image, we can blend them together. To blend them together, we'll need a larger canvas that can fit in both images and their location on a reasonable scale. To get that, I select one point from each size and calculate their corresponding locations given the shape of both images. After that, we can generate two larger images that are ready for blending. For the actual blending algorithm, I reuse codes from part 2 that utilizes Laplacian stacks and Gaussian Stacks.
Below are the showcases where I only include two images to compute the mosaic.
In a more general case, we might want to use more images to compute a larger mosaic. My implementation to do that is to computer the blending of images one pair by one pair, and gradually increase the size of the final mosaic.
What I learned the most is that some ideas might sound easy and intuitive, but the details might be very hard. To get things to work the way I want, I really need to a lot of attention to every possible detail in implementation and do not underestimate the difficulty of any engineering task.
To enable the automatic stitching process, the first thing that needs to work is automatically selecting good features from the pair of images. A good set of predominant features, which should also be easily recognized, distinguishable from each other. To achieve that, we select corners. To implement that, I used the Harris Interest Point Detector provided by course staff. One thing to pay attention to is that in order to use Harris Interest Point Detector, I transferred original RGB value images to Pure Black and White images.
Below are a set of examples where Harris corners overlaid on images.
One problem with the corners automatically selected is that there are so many of them, which is not applicable to directly feeding into the Homography matrix computation. Thus, I implement the Adaptive Non-Maximal Suppression, which is basically explained by the following formula:
This ensures that the corners we filter will be the ones with the strongest scores within near corners, and also make these filtered corners more evenly distributed. Based on the parameter provided by the research paper Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al. I reselect 500 corners from the result of the previous Harris Corners Detector.
Even we have a smaller set of feature points, another problem is that how do we match them in between two images. The answer is that we can use the features extracted from the patches center at the corner points, and match points with similar features. To achieve that, the first thing is to extract the feature patches. Two important details in my implementation are 1. Get a 40 by 40 patch, and then downsample it into an 8 by 8 patch. 2. Normalize the 8 by 8 patch.
Below are some visualizations of the 8 by 8 feature patches.
Within each feature patch extracted, we then start the matching between each patch to determine to match in the pairs of points in the image. The way to do that is by calculating the distance between each possible pair of patches and then doing the nearest neighbor analysis. The metric to determine whether to keep the point of not is by calculating the first nearest neighbor distance divided by the second nearest neighbor. If the first one is way better than the second, we'll then keep it.
With the figure above, I choose the thresholding value = 0.2. Below, I will show feature points overlaid on the original images after the matching process.
Even after we have way fewer points than before, the one-to-one correspondence between the two sets. To eliminate the odds, I implement the Random Sample Consensus(RANSAC). Within the Pseudo-code shown below:
This helps to keep all points that don't conflict with each other and share the same Homography matrix. Below I show results where points after RANSAC overlaid on original images.
After we automatically select all of the points to compute the Homography, we can then begin the image stitching process. Below, I'll compare the mosaics from both manually selected points and automatically selected points.
As you can see, for comparison sets #2 and #3, the automatic mosaic produces better results, especially in the boundary parts between two images. Generally, Automatic mosaics use pixel-wise features that are more precise than manually selected points. The Automatic process finds more points to calculate the Homography, which leads to more robust transitions between images. However, it is not absolute that Automatic mosaic is always better than Manually selected points. In comparison set #1, the manual mosaic is better.
The coolest thing I learned is that I am not that far from research-based projects. As an undergraduate student, I didn't have much chance to implement codes based on a research paper. This second part gives me a sense that researches are not as scary as I once imagined, and prompts me to follow that path and look more into the field.