Image warping and mosaicing

Roma Desai | CS -194 Project 5

 

PART A

 

OVERVIEW

 

For this project, I shot a couple individual photographs and warped them together using homographic projections to create an image panorama. This technique allows us to take separate photos but create a combined image that shows a much larger field of view.

 

 

PART 1: SHOOT THE PICTURES

 

The first step was to shoot some pictures. To ensure the transformation between each photograph was perspective, I shot from the same point of origin but rotated the camera to capture different angles. I also had to ensure my photographs were taken with the same aperture and exposure settings. I made sure the images overlapped by about 50% so I could later identify common key points between the images. Here are a few I took around my house:

 

A living room filled with furniture and a large window

Description automatically generated

A living room filled with furniture and a rug

Description automatically generated

A close up of a decorated tree in a room

Description automatically generated

A living room filled with furniture and a large window

Description automatically generated

A house with trees in the background

Description automatically generated

A house with trees in the background

Description automatically generated

 

 

PART 2: RECOVER HOMOGRAPHIES + WARP IMAGES + IMAGE RECTIFICATION

 

Next, I wrote a function to calculate the homographic transformation between the first image to the second image. I selected corresponding points and solved for H in the equation p’ = Hp. To give a better result, I used multiple points and turned this into a least squares problem solving for the entries in H.

Next, I wrote a warp function to take the first image and apply the homographic transformation to align it with the second image’s perspective. To test my two functions so far, I took a couple side view photographs and rectified them to show a top view image. Here are some of the results.

 

Original

Rectified

A close up of a door

Description automatically generated

A close up of a tiled floor

Description automatically generated

A picture containing sitting, wooden, table, board

Description automatically generated

A close up of a box

Description automatically generated

A picture containing sitting, suitcase, luggage, small

Description automatically generated

A picture containing calendar

Description automatically generated

 

 

PART 3: BLEND THE IMAGES

 

Finally, I combined the images by warping the first image into the geometry of the second image, and then using a laplacian pyramid to merge the images together. I decided to use laplacian pyramids because it resulted in much less edge artifacts when compared to simply adding the images together. Since I was using my phone camera and the lighting did not stay constant the entire time, I believe some edge artifacts may be as a result of that. I also found it difficult to specify corresponding points in my images that had irregularly shaped objects such as trees and flowers. Because of the squares and straight lines in my first image, I believe it turned out better. Shown below are the warped images as well as the final combined image.

 

Warped

Blended

A living room filled with furniture and a large window

Description automatically generated

A living room filled with furniture and a large window

Description automatically generated

A room filled with furniture and vase of flowers

Description automatically generated

A living room filled with furniture and vase of flowers next to a window

Description automatically generated

A house with trees in the background

Description automatically generated

A house with trees in the background

Description automatically generated

 

 

REFLECTIONS:

 Overall, this was a super cool project that led me to appreciate everyday tools we take for granted such as the panorama creator in iPhones! I think the coolest part of this project was how you can completely change the view at which an image is being displayed just by a single transformation. The fact that you can go from looking at something from the side to looking at something from a top view with no external information is mind blowing! Overall, I really enjoyed this project and can’t wait to explore these concepts further.

 

 

PART B

 

 

Overview

 In part A, I created panoramas by manually selecting feature points and warping the images together. For this part, I will implement the paper “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al. to automatically match features and then warp together images to create a panorama.

 

 

Step 1: Harris Corner Detection

 

 To begin, I used the provided starter code to detect all the corners in each image. Corners serve as good indicators of various features in the images and allow for a much better matching. The Harris Corner Detection algorithm calculates the corner response of all points along the image and selects the regions with high responses. To prevent having a large number of detected corners, we take the local maximum points. I set a spacing of about 40. Here are the detected corners on two images.

 

A living room filled with furniture and a large window

Description automatically generated

A close up of a logo

Description automatically generated

A living room filled with furniture and vase of flowers

Description automatically generated

A picture containing window

Description automatically generated

 

 

Step 2: Adaptive Non-Maximal Suppression

 

 While setting the spacing reduces the number of points, we still want to further reduce the number of points to a set amount every time. However, while doing so, we want to make sure we get an even spread of points across the image. To do this, we calculate the radius from every point to every other point and keep the points that have a corner value much higher than their neighbors. I used a value of .9 to decide what “much higher than their neighbors” means. Once we get the radius for each point, we keep the top 500 points with the largest radius value. In this way, we can suppress the weaker points while still maintaining a good spread. Here are some results after applying non-maximal suppression to the detected corners.

 

Before

After

A close up of a logo

Description automatically generated

A picture containing indoor, bed, light, computer

Description automatically generated

A picture containing window

Description automatically generated

 

 

Step 3: Extracting Feature Descriptors & Implementing Feature Matching

 

 Once we have our points between two images, we have to match them together using feature descriptors. I described each point by the 40 x 40 patch around the point. To reduce error due to noise or brightness or other factors, I gaussian blurred the patch and down sampled it to an 8 x 8 feature descriptor. Finally, I subtracted the mean and divided by the standard deviation for each patch.

 

Now, matching points is much easier using feature descriptors. I calculated the SSD error between each patch on the to-be-warped image and the stationary image to find which points correspond to each other. To make the matching more robust, I calculated the 2-NN error in addition to the 1-NN error and took a ratio of the two. Similar to adaptive non-maximal suppression, we only want to count a pair of points as a match if the match is significantly better than any other pair. To do this, I only counted a pair of points as corresponding if the ratio 1-NN/2-NN < .3

 

Here are the corresponding points for my first image:

 

Left Image

Right Image

A room filled with furniture and a large window

Description automatically generated

A bedroom with a bed and desk in a room

Description automatically generated

 

 

Step 4: RANSAC

 

 Even with all this, the feature matching is not robust to outlier points. This is a big issue because using least squares to calculate homographies later can create huge errors if even one point is off. I used the random consensus sampling algorithm or RANSAC to find the best set of points. First, I randomly selected 4 points from my array of points. I calculated the homography for these points and used it to warp the left image points into the right image points. Then, I calculated the Euclidean distance between random transformed points to the actual points of the right image. If the distance was < 2 pixels, I considered the point an inlier. I repeated this process for about 1000 iterations and at the end selected the homography that gave me the largest set of inlier points. Here is an example of some final calculated points:

Left Image

Right Image

A living room filled with furniture and a large window

Description automatically generated

A bedroom with a view of a living room

Description automatically generated

 

 

RESULTS

 

Once I had the points, I used my work from Part A to stitch the images together. I took the points outputted by RANSAC, recalculated the homography, warped the left image into the right image, and finally used a laplacian pyramid to combine the images. Here are the results of the manually selected feature points compared to the results of the automatically computed feature points:

 

Part A Panorama

Part B Panorama

A living room filled with furniture and a large window

Description automatically generated

A living room filled with furniture and a large window

Description automatically generated

A living room filled with furniture and vase of flowers next to a window

Description automatically generated

A living room filled with furniture and vase of flowers next to a window

Description automatically generated

A house with trees in the background

Description automatically generated

A house with trees in the background

Description automatically generated

 

 

REFLECTIONS:

 

 While the second part of the project was slightly more challenging than the first, it was much cooler and much more rewarding to see all my work come together. I think it’s amazing how something as simple as corners can be used to identify real objects such as tables, couches, carpet designs and more. I thought it was really cool how some clever math can be used to generate amazing results. When I first heard about this project, I assumed we would need some sort of ML model to find the features but this method turned out to be even cooler. I was especially surprised how the program generated matching features resulted in the same quality if not better panorama as the manually selected features. Overall, I learned a lot and really enjoyed this project.