please view on half screen for best visuals
short summary of what this was all about :)
In this project, we utilized a homography (perspective) transformation to rectify images
and produce warped versions for later image stitching. Points were manually selected and
sources for images that are not my own are linked below. In the second half, we utilized automatic
feature detection and matching, using the same stitching/warping algorithm as we did for the first half.
Museum
Time Magazine: Museum
A common task is to align images to a standard frame. In this part of the project, we selected images with portraying objects with rectangular frames and transformed them into a front view rectangle through a process called rectification. We then extend the relationship between perspective transformed images into the realm of image stitching to create perspective-corrected panoramas.
Here are some outcomes of rectification!
Here we see a kiwibot. We chose the four points of the kiwibot label and made them correspond to the four points of the image. Now we can see a rectivied version, though grainy.
We then looked at the art pieces in a museum and it was able to make a distorted painting look the right proportions.
I used to like perspective art a lot, guess it's been put to good use.
This result was pretty interesting because it was able to change the view of a building. There were some parts of the image that were missing due to output image size constraints. In the next part (stitching and auto-stitching), there would be methods to account for empty locations and entire image transformations.
More self art promo :').
Here we extend the applications of rectification and utilize image stitching to warp between images. Using a tripod and rotating the camera, we took 3 photos from the same place. Correspondence points of between 10-16 is satisfactory for producing results. First we performed warping between the left and the middle, and then the result of the left-middle warp and the right. After adjusting dimensions, we then warped each two images together. We used a gaussian mask and alpha blended the images.
Below is the result of warping the middle image into the perspective of the left image. Here, we see that most things line up pretty well. The bounding box prediction is quite useful in determining how big the image would be. The seam between is almost unnoticeable. You can begin to tell where it is by looking at the grass. Some doubling effects occur due to miniscule misalignment.
Now the final reveal.
Another one!
As you can see, there are some ghosting effects, especially in the middle of the image. Some places are also shifted higher than others. This may be due to greater need in correspondence points or unintended shifts while taking the images.
And finally, because I couldn't get better pictures. Images courtesy of Shivam Parikh, from course provided pictures.
Here, manually selected points performed pretty will. The doubling effects seen near the left-middle side is not immediately visible, but when looking close, there's double of the rods and buildings. Not great, but from a distance, seems natural.
It took me quite a while to debug my homography calculation, but through it, I recognized the importance of having w and having consistent ordering of correspondence points. Overall, this was pretty fun and I was glad I could go outside to try to take good photos (though not successful).
After the gruelling half of hand-labeling points, now we can sit back and let our computer
do the work. In this section, we designed an automatic feature detector and stitched together
the images as we did in part 1. We split the work into 5 main portions:
2.1. Corner detection with Harris Interest Point detector
2.2. Identifying most dominant points with even spread using Adaptive Non-Maximal Suppression (ANMS).
2.3. Creating feature descriptors and matching them
2.4. Using Random Sample Consensus (RANSAC) to reduce outliers
2.5. Mosaicing and stitching the images together
We see the progression of these steps with our first set of photos from the manual section.
To detect corners, we find locations in the image where significant changes in image content are
present in all directions. To measure this change, we utilize the outer product of gradients and
define the corner strength or corner response as the determinant of the resulting matrix divided by the
its trace. Interest points are points where the corner strength is a local maximum in a 3x3 pixel region.
Here we see the large proportion of interest points that the pre-written Harris points function provides.
Setting the min_distance to 50, we make the viewing field a lot less cluttered, though still very packed.
From the two pictures above, it is clear that we have to be more selective with the points in order to create
meaningful results. To do so, we decide to find the top 500 most dominant points. We define dominant with the help
of a point's suppression radius, or minimum radius until they reach another point that has significantly greater corner strength than
the current point. We follow the equation below from the paper
Multi-Image Matching using Multi-Scale Oriented Patches by Brown, Szeliski, and Winder
.
ri represents the minimum suppression radius, the function f represents the corner strength
as defined in 2.1, and Crobust is a constant, 0.9, that ensures that the the stronger point is stronger
by a larger margin. I represents the set of all points that we selected from 2.1.
We implement this with the following general approach:
For each interest point, p calculate the minimum distance to a significantly stronger point save this minimum distance and pThen, after we map the minimum distance/radius to their respective points, we sort in decending order and take the top N points to keep. For this project, I set N=500. The results are shown below. Much more pleasing to the eye than the jumble we saw before.
Even though we have made significant improvement, it's clear that not of the points will have
a corresponding point in the other image. Here we utilize neighboring information from pixels near each point
of interest to give us context on matches.
We first centered a 40x40 patch around every point, then we bias/gain normalized by subtracting the
mean of the patch by its standard deviation. To reduce disturbances from higher frequencies, we then
resize to 8x8 patches.
We utilzed the Sum of Squared Differences (SSD) to calculate the magnitude of difference between a
patch from one image and a possible match from another image. Implementing Lowe thresholding, we
test to see if the ratio of the best (minimum) error with the second best error is below a predefined cutoff.
I specifically found a threshold of around 0.6-1 as satisfactory.
Matching results are shown below. Just by looking at the images, we can see that the majority of these correspondences are
indeed correct. A couple of outliers exist. For instance, in the left image, there is a point on the road and the tree
that has no correspondence in the right image.
As we have noticed, some points are matched incorrectly. Since we utilized least squares to compute our
homography as in Part I, we are prone to large changes in transformations from the inclusion of outliers.
To make sure that we only pick points that are informative, we implement RANSAC, which uses random samples
of points, tests the estimate against expected values, and keeps the points that are estimated correctly
within a threshold, effectively maximizing the inliers. This process is repeated until the probability of
getting correct points falls below a threshold or when the user decides is enough. Guidelines on picking
the number of iterations can be found HERE.
This can be summed up with the following procedure:
For each iteration: Find 4 random set of points (4 from each image point set). Calculate the homography Test the estimate of the homography against all the points Keep the ones that fall below a threshold Update a set of inliers only when the iteration inlier set has more elements than the global set end loop and calculate homography from largest inlier set.This significantly reduces the amount of points. I set the iterations to 500 and error bound to 15, but found 0.5 was sufficient for most images.
Now after we have managed to reduce ~2000 points to less than 50, we are now ready to create the stitched version of these images. The procedure is the same as that of the panorama performed in Part 1. Here are side by side comparisions of the automatic stitched images and the manually stitched images. The Left image is MANUAL and the Right is AUTO.
Though both manual and auto performed well, we can see some significant discrepancies between the two as we look closer. The Left half the image for both are similar, with doubling effects visible on the left grass patch. However, mannual ppear more pleasing on the right half. This is due to the fact that blurring happens along the sky portion, where colors and branches are lighter and less distinguishable.
Auto performed better on this round. We see that the car is much clearer than that of the manual. Though, both struggled with aligning the trees and the house at the very center. This could be due to camera shifts.
Once again, auto takes the cake. Though not very visible, we see that manual faces heavy shifting issue on the right side of the image. Auto, on the other hand is practically pristine, with minimal defects near the roof.
And yes, we have finally made it. This project was by far my favorite. It taught me that simple approaches may sometimes be the most effective. Though ANMS and RANSAC rest on simple logic, they undoubtedly work very well and required minimal fancy imports or head-breaking algorithms to implement. Additionally, I also changed the way I implemented things. In the beginning of the semester Prof. Efros told us to avoid for loops. This project ingrained this mentality into the way I now approach things, as I see just how much quicker vectorized functions could be.