This two-part project is focused on transforming and combining multiple photographs to form a single, cohesive image.
In this part of the project, I performed image mosaicing by taking two or more photographs and creating an image mosaic by registering, projective warping, resampling, and compositing them. Below, I will walk through the steps I took to produce my final result for this part.
Below are the photograph sets I took. I aimed for 40-70% overlap between consecutive photos, using a tripod to maintain the same camera position while varying viewpoint.
Set 1: Scene
scene 0 | scene 1 | scene 2 |
---|---|---|
Set 2: Tree
tree 0 | tree 1 | tree2 |
---|---|---|
Set 3: Park
park 0 | park 1 | park2 | park3 |
---|---|---|---|
I warped my images into alignment with a projective transformation, or a homography, expressed as the matrix multiplication p' = Hp. To find the parameters for the transformation, I took corresponding points p from the original image and p' from the target image and calculated H with least-squares on Ah = b, where h is a length-8 vector of unknowns in H. A homography has four degrees of freedom, so four or more reference points are needed from each image. As an example, below are two of my photographs from above, each with eight lableled correspondence points.
scene 0 | scene 1 |
---|---|
Once I have the homography matrix H, I can project any image onto any other by applying the transformation p' = Hp to the original image. To do this, I computed an inverse warp with bilinear interpolation. Projective warping is a very powerful tool. For example, I can use it to do...
To make sure that my homography transformation implementation is correct, I first took some pictures of planar surfaces and warped them to make those surface planes front-parallel. Below are the original images, side-by-side with their warped counterparts. I have also plotted the reference points I used for rectification.
Before Rectification | After Rectification | |
---|---|---|
Window | ||
Bridge Sign | ||
Map of London |
Now, I will take my three overlapping photograph sets and blend each one into a single, continuous mosaic. First, I shall project the first and third images one by one to match the features of the middle image. Then, I will combine them using weighted averaging.
Here are the results of the warping:
Mosaic 1: Scene Mosaic
Scene 0 | Scene 2 | |
---|---|---|
Before Warp | ||
Scene 1 Labels | ||
Warped to Match Scene 1 |
Here are the pieces before combining:
Scene0 | Scene1 | Scene2 |
---|---|---|
And here is the final, combined result:
Mosaic 2: Tree Mosaic
Tree 0 | Tree2 | |
---|---|---|
Before Warp | ||
Tree 1 Labels | ||
Warped to Match Tree 1 |
Here are the pieces before combining:
Tree0 | Tree1 | Tree2 |
---|---|---|
And here is the final, combined result:
Mosaic 3: Park Mosaic
Park 0 | Park 2 | Park 3 | |
---|---|---|---|
Before Warp | |||
Target Labels | |||
Warped to Match Park 1 |
Here are the pieces before combining:
Park0 | Park1 | Park2 | Park3 |
---|---|---|---|
And here is the final, combined result:
This mosaic is not quite as well-aligned as the other ones. I suspect that the reason is a combination of wider total angle (combining four photos instead of one) and some slight variation in camera position from bumping the tripod.
One of the coolest things I learned from this project is how powerful projective warping is. Before this project (and the lecture covering the requisite material), I never suspected that a simple homography would be enough to completely transform the viewpoint angle of an image.
In this second part of the project, I created a system for detecting matching features and automatically stitching images into a mosaic. The approach I used is based on the following paper: https://inst.eecs.berkeley.edu/[cs194-26/fa20/hw/proj5/Papers/MOPS.pdf
First, I used a Harris Interest Point Detector to detect corners in my source images. Below are my images overlaid with automatically detected interest points.
Scene 0 | Scene 1 | Scene 2 | |
---|---|---|---|
Original | |||
Harris Points |
Tree 0 | Tree 1 | Tree 2 | |
---|---|---|---|
Original | |||
Harris Points |
Park 0 | Park 1 | Park 2 | Park 3 | |
---|---|---|---|---|
Original | ||||
Harris Points |
Adaptive Non-Maximal Suppression (ANMS):
As you can see, the above Harris points are very dense. To reduce the feature set size to a manageable number while maintaining a good distribution across the area of the image, I suppressed all Harris corners that do not represent a maximum corner strength within a given radius. I fine-tuned this radius to yield the desired number of points (500). The results are visible below:
Scene 0 | Scene 1 | Scene 2 | |
---|---|---|---|
Original Harris Points | |||
After ANMS |
Tree 0 | Tree 1 | Tree 2 | |
---|---|---|---|
Original Harris Points | |||
After ANMS |
Park 0 | Park 1 | Park 2 | Park 3 | |
---|---|---|---|---|
Original Harris Points | ||||
After ANMS |
Next, I converted each corner feature to an 8x8 feature descriptor patch. To do this, I sampled a 40x40 pixel patch around each interest point, downsampled it to 8x8, and normalized it to achieve a mean of 0 and a standard deviation of 1. To avoid aliasing, I performed a Gaussian blur on each patch before downsampling. Below are a few examples of feature descriptors generated from my photos:
After calculating the features, I found unique matching pairs of feature descriptors between pairs of adjacent images using approximate nearest-neighbors (NN). To do this, I calculated the SSD distance between all pairs of feature descriptors between image pairs. Then, I used the "Russian Granny Trick" of rejecting any pair that does not meet the threshold (1-NN Distance) / (2-NN Distance) < 0.3. Here are the resulting pairs of images marked with corresponding points:
First Image in Pair | Second Image in Pair |
---|---|
I used RANSAC (RANdom SAmple Consensus) to get rid of any remaining outliers after applying my feature-space outlier rejection. To do this, I looped through my corresponding point pairs repeatedly, selecting random groups of four pairs and calculating the resulting homography. Then, I kept the inliers from the homography that had the most agreement among all the point pairs and generated a final homography from those inliers using least squares. Here are the final correspondance point pairs:
First Image in Pair | Second Image in Pair |
---|---|
With all the homographies in place, I now had all the pieces I needed to produce photograph mosaics. As before, I warped every photograph to match the geometry of the second photo in its respective set. Below, I show the autostitched images side-by-side with the hand-stitched panoramas from Part A.
Scene:
Manually Stitched | |
---|---|
Automatically Stitched |
Tree:
Manually Stitched | |
---|---|
Automatically Stitched |
Park:
Manually Stitched | |
---|---|
Automatically Stitched |
As an experiment, I decided to project the scene mosaic onto a cylindrical surface. I did inverse sampling from my original, unwarped images to the mosaic. After some testing, I settled on a radius of 1600 and focal length of 1410.
Cylindrical Warps of Each Source Image:
Scene 0 | Scene 1 | Scene 2 | |
---|---|---|---|
Original | |||
Cylindrical |
Cylindrical Mosaic:
Flat Mosaic | |
---|---|
Cylindrical Mosaic |
As a non-standard Bell/Whistle, I decided to use homography to add some decoration to a room that looked rather spartan. First, I took a photo of a room with blank walls. Then, I projected a painting to match the geometry of one of those walls. By combining the two images, I managed to "hang the painting" onto the wall. The painting I used is Sergei Ivanovich Lukin's It Has Come to Pass.
Painting | |
---|---|
Wall |
I labeled the points on the wall where I wanted to hang the painting and projected the painting's picture to that shape, as shown below:
Painting | |
---|---|
Wall |
Final Result:
The Coolest Thing I Learned from this Project
As mentioned before, I was very impressed by how versatile homographic transformations are. Other cool things I learned in the course of completing this project include Harris corners and the RANSAC procedure. I think both of these are incredibly clever ways of quantifying seemingly subjective concepts like interesting feature points and matching sets of correspondence points. It is very impressive how "intelligent" image processing procedures can act even without incorporating more recent AI techniques like neural networks.