Image warping
and mosaicing
Roma Desai | CS -194 Project 5
PART A
OVERVIEW
For this project, I shot a couple individual
photographs and warped them together using homographic projections to create an
image panorama. This technique allows us to take separate photos but create a
combined image that shows a much larger field of view.
PART 1: SHOOT THE PICTURES
The first step was to shoot some pictures. To
ensure the transformation between each photograph was perspective, I shot from
the same point of origin but rotated the camera to capture different angles. I
also had to ensure my photographs were taken with the same aperture and
exposure settings. I made sure the images overlapped by about 50% so I could
later identify common key points between the images. Here are a few I took
around my house:
|
|
|
|
|
|
PART 2: RECOVER HOMOGRAPHIES + WARP
IMAGES + IMAGE RECTIFICATION
Next, I wrote a function to calculate the
homographic transformation between the first image to the second image. I
selected corresponding points and solved for H in the equation p’ = Hp. To give
a better result, I used multiple points and turned this into a least squares
problem solving for the entries in H.
Next, I wrote a warp function to take the
first image and apply the homographic transformation to align it with the
second image’s perspective. To test my two functions so far, I took a couple
side view photographs and rectified them to show a top view image. Here are
some of the results.
Original |
Rectified |
|
|
|
|
|
|
PART 3: BLEND THE IMAGES
Finally, I combined the images by warping the
first image into the geometry of the second image, and then using a laplacian
pyramid to merge the images together. I decided to use laplacian pyramids
because it resulted in much less edge artifacts when compared to simply adding
the images together. Since I was using my phone camera and the lighting did not
stay constant the entire time, I believe some edge artifacts may be as a result
of that. I also found it difficult to specify corresponding points in my images
that had irregularly shaped objects such as trees and flowers. Because of the
squares and straight lines in my first image, I believe it turned out better.
Shown below are the warped images as well as the final combined image.
Warped |
Blended |
|
|
|
|
|
|
REFLECTIONS:
Overall, this was a super cool project that
led me to appreciate everyday tools we take for granted such as the panorama
creator in iPhones! I think the coolest part of this project was how you can
completely change the view at which an image is being displayed just by a
single transformation. The fact that you can go from looking at something from
the side to looking at something from a top view with no external information
is mind blowing! Overall, I really enjoyed this project and can’t wait to
explore these concepts further.
PART B
Overview
In part A, I created panoramas by manually
selecting feature points and warping the images together. For this part, I will
implement the paper “Multi-Image
Matching using Multi-Scale Oriented Patches” by Brown et al. to
automatically match features and then warp together images to create a
panorama.
Step 1: Harris Corner Detection
To begin,
I used the provided starter code to detect all the corners in each image.
Corners serve as good indicators of various features in the images and allow
for a much better matching. The Harris Corner Detection algorithm calculates
the corner response of all points along the image and selects the regions with
high responses. To prevent having a large number of detected corners, we take
the local maximum points. I set a spacing of about 40. Here are the detected
corners on two images.
|
|
|
|
Step 2: Adaptive Non-Maximal Suppression
While setting
the spacing reduces the number of points, we still want to further reduce the
number of points to a set amount every time. However, while doing so, we want
to make sure we get an even spread of points across the image. To do this, we
calculate the radius from every point to every other point and keep the points
that have a corner value much higher than their neighbors. I used a value of .9
to decide what “much higher than their neighbors” means. Once we get the radius
for each point, we keep the top 500 points with the largest radius value. In
this way, we can suppress the weaker points while still maintaining a good
spread. Here are some results after applying non-maximal suppression to the detected
corners.
Before |
After |
|
|
|
|
Step 3: Extracting Feature Descriptors
& Implementing Feature Matching
Once
we have our points between two images, we have to match them together using
feature descriptors. I described each point by the 40 x 40 patch around the
point. To reduce error due to noise or brightness or other factors, I gaussian blurred
the patch and down sampled it to an 8 x 8 feature descriptor. Finally, I
subtracted the mean and divided by the standard deviation for each patch.
Now, matching points is much easier using feature
descriptors. I calculated the SSD error between each patch on the to-be-warped
image and the stationary image to find which points correspond to each other.
To make the matching more robust, I calculated the 2-NN error in addition to
the 1-NN error and took a ratio of the two. Similar to adaptive non-maximal
suppression, we only want to count a pair of points as a match if the match is significantly
better than any other pair. To do this, I only counted a pair of points as
corresponding if the ratio 1-NN/2-NN < .3
Here are the corresponding points for my
first image:
Left Image |
Right Image |
|
|
Step 4: RANSAC
Even
with all this, the feature matching is not robust to outlier points. This is a
big issue because using least squares to calculate homographies
later can create huge errors if even one point is off. I used the random consensus
sampling algorithm or RANSAC to find the best set of points. First, I randomly
selected 4 points from my array of points. I calculated the homography
for these points and used it to warp the left image points into the right image
points. Then, I calculated the Euclidean distance between random transformed
points to the actual points of the right image. If the distance was < 2 pixels,
I considered the point an inlier. I repeated this process for about 1000
iterations and at the end selected the homography
that gave me the largest set of inlier points. Here is an example of some final
calculated points:
Left Image |
Right Image |
|
|
RESULTS
Once I had the points, I used my work from
Part A to stitch the images together. I took the points outputted by RANSAC, recalculated
the homography, warped the left image into the right
image, and finally used a laplacian pyramid to combine the images. Here are the
results of the manually selected feature points compared to the results of the
automatically computed feature points:
Part A Panorama |
Part B Panorama |
|
|
|
|
|
|
REFLECTIONS:
While
the second part of the project was slightly more challenging than the first, it
was much cooler and much more rewarding to see all my work come together. I
think it’s amazing how something as simple as corners can be used to identify
real objects such as tables, couches, carpet designs and more. I thought it was
really cool how some clever math can be used to generate amazing results. When
I first heard about this project, I assumed we would need some sort of ML model
to find the features but this method turned out to be even
cooler. I was especially surprised how the program generated matching features
resulted in the same quality if not better panorama as the manually selected
features. Overall, I learned a lot and really enjoyed this project.