Image Warping and Mosaicing

Shoot the Pictures

When shooting pictures, I found scenes that had lots of detail. This is important because capturing images up close, then stitching, will preserve more of the detail than taking the image from far away, to capture the entire scene in one image. I shot the images on an iPhone using the burst mode for panoramas, and when taking images of a plane from multiple points of projection, I used the Obscura app, which allowed me to fix the camera settings while taking pictures.

I had trouble finding good lighting and making sure that the camera moved slowly enough that the images were sharp, without moving too much in the wrong directions.

Recover Homographies

See below how the system of equations was set up:

As the project description mentioned, it was important to select several points of correspondence, so I made sure to choose at least 6 in each scene. Early on, I had weird results, that couldn’t be explained by any bugs in my code. I soon realized that it was because I was not precise enough when choosing points. After being more careful, the results were much better.

Warp the Images

For this part of the project, I used an inverse warp along with cv2.remap for interpolation. My implementation did not require any for loops. The biggest challenge in this portion of the project was tracking which pixels had values. To do so, I warped a white image as well, and used the result to create a mask, which I then added as another channel to the warped image.

Image rectification

Fortunately, I didn’t have too much trouble with this part. My warp function worked well, so the only challenge was marking points well. After some trial and error, I was able to get satisfactory results, which are shown below:

Second image (a picture I took at the Vatican):

Blend the Images into the Mosaic

I chose to leave one image unwarped, and I warped all of the other images into its projection.

When importing images, I used dstack to add something similar alpha channel to each image, with 0 representing a fully transparent pixel, and 1 representing a non-transparent pixel. When running cv2.remap for warping, I use an image filled with zeros (it will be fully transparent). Then, cv2.remap doesn’t edit pixels in the output image if those pixels are outside of the input image. Hence, all pixels that weren’t taken from the input image will have an alpha of 0, and the rest will have a value of 1.

After warping each of the images, I took there sum, and divided it by the sum of the alpha channels. This ensures that pixels are only averaged with pixels from another image if that other image has pixels at that location. I think the blending itself worked pretty well, as it isn’t obvious when looking at the results that the images were blended. That said, there are lines where the images meet that do not look very good. I believe that this problem was mostly created by poor point selection and inadvertent movements of the camera while taking pictures. The effects can probably be mitigated by adding feathering in overlapping regions. Below I display each original image, followed by the result. For the first scene, I also show each of the images warped.

Set 2:

Set 3:

Set 4:

What I learned

I was most impressed by the process for recovering homographies. It was exciting to use more intricate algebra to avoid using w_hat, and to avoid solving for h_33. I also use many applications, like Google Street View, that rely on image stitching, so it was exciting to implement it myself. I was also happy to learn that my math and programming skills are sufficient to implement software similar to Google Street View.

Bells and Whistles

First, I used projections to project a painting that is in Doe Library on Campus onto a screen in an amphitheater in Italy. I took the second picture on a trip there a few years ago. In order to do this, I changed the background of the painting image to black. Then, when inserting pixels from the painting image, I only used pixels that were not black, thereby removing the background. This worked because non of the pixels in the image were truly black, but if there were black pixels, I could have used an alpha channel, as I did in the previous parts, instead. Below are the original images followed by the result.

Next, I implemented the 3D rotational model. I used the jpg information to see that the focal length was 4.2mm, and then use Procrustes algorithm to get R, and made the necessary transformations. As you can see from the results below, it was unsuccessful.

For the next portion of the Bells & Whistles, I iteratively created nested images. I used the picture of an amphitheater that I took, and projected the image on to the screen in the image. I then iteratively projected the the result back onto the original image, until it looked like an infinitely nested image. To do this, I selected the corners of the screen and the corners of the image as correspondence points, then used the same method that I used for the first part of this section.

Project 4B: Feature Matching for Auto Stitching

Interest point Selection

For this section, I converted each image to black and white then used the code in the provided harris.py, but with a slight modification. For the sake of runtime, I got only kept the 15,000 points with the highest corner strength.

Adaptive Non-Maximal Suppression

I used the efficient method described in the MOPS paper. I computed the minimum supression radius for each point using the equation in the paper, and kept the points with the largest radii. I used a threshold 0.9, and kept at most 750 interest points Below, you can see two examples of the points before and after ANMS. In order to keep the code efficient, I had to use a lot of numpy features that I wasn’t previously familiar with. This led to some difficulties, and I ran into some bugs that were hard to identify. Otherwise, this portion went pretty smoothly.

Before and after 1:

Feature Descriptor Extraction

For this section, I took a 40x40 pixel window around each point in each image, then downsampled it to be an (8x8) descriptor. Finally, I did bias/gain normalization. This part was pretty straight forward, so I didn’t run into any issues when implementing it.

Feature Matching

For this project, I fixed one image as the “target image”. In this section, I find correspondence points between each image and the target image, keeping the best match for each point if the error ratio was below a threshold. I used SSD as my error function, and 0.15 as my threshold.

I struggled a lot with this part, mostly because of bugs in my code. I put a lot of effort into reducing runtime, so my code got pretty complicated and hard to track. At one point, I got worried because I realized that my algorithm had no guarantee that matchings are exclusive, so to points in an image could be matched with the same point in the target image, but this wasn’t actually an issue, for reasons described in the following section.

RANSAC

I implemented the loop as described, with 1000 iterations. At each iteration, I compute a homography using four randomly chosen point pairs. Then, using a threshold of 1, I found the inliers for that homography. Througout the iterations, I kept track of the largest set of inliers. After 1000 iterations, I compute the homography again from the inliers.

As mentioned in the previous section, I was surprised to find that it didn’t matter if matchings were one-to-one. I believe that this is because if in RANSAC, two of the points chosen randomly in an iteration map to the same point, we can’t compute a good homography, so that homography won’t have enough inliers to be used. Then, when recomputing the entire homography, it shouldn’t matter as long as there are enough points that do have one-to-one mappings.

Results

For this section, I reused most of my images from the previous part, so feel free to see the full image sets above. For each image set, I will first show the result using manual point selection, followed by the result from this part.

Doe: Notice that the image with the manually chosen points isn't aligned as well, which creates more noticeable seems and causes the image to look blurry in overlapping portions.

Library Shelves: This is a bit blurry, but that is because the original images are blurry, not because of the algorithm. I added a fourth set instead, but I still wanted to show this result. Another interesting thing about this pair is that the manual panorama has better stitching on the left side of the image, while the auto panorama has better stitching on the right side.

Chemistry Building: This set worked well for both parts of teh project, but you can see that there is much less blurring when points were selected automatically, and it is very hard to find the seams. This is my favorite result.

Kitchen: While the difference isn't as big for this image, you can see that the second image, which used autostitching is much less blurry, and has more subtle seams than the manual panorama.

What have you Learned?

I struggled with bugs on this project for a very long time, so I was trying everything I could think of to debug. In doing so, I created histograms of the error ratios used in feature matching. It was a good demonstration of what the paper described. You can see from the histogram below that most of them have similar errors, so it tends to be pretty clear when we should choose a pair. It was also cool to compare the results from the two parts. In part A, my eye was immediately drawn to seams in the images, where the edges of images were, but in part B, they were much less noticeable. While we have good intuition about which points to select, we lack the patients to choose enough points and the precision to choose them well, so in this case, it seems that autostitching is superior.

Bells And Whistles

I implemented panorama recognition. To allow random orderings of images that form a panorama, I run the stitching algorithm on the unordered images until the end of RANSAC, which gives me the homographies I need. Then, I apply the homographies to all of the image centers. I then subtract the true center from the transformed center and compute the mean across images. Then, I determine the direction of the panorama (horizontal or vertical) by seeing which axis has the highest change on average. Then, I set the middle image to be the image whose transformed center is in the middle with respect to the other image centers. Note that my implementation relies on fixing one target image, and warping all other images to that target, so this method selects the target image when the images are in a random order. In the results below, you can see that there is absolutely no difference between the panoramas created by the randomly ordered images and the panoramas created by the ordered images.