To collect images to warp and mosaic, I shot images of various Berkeley libraries where the position of the point of view is held constant, but the camera angle rotates.
For the corresponding sets of points (p, p') in our images, we want to compute the homography matrix H that transforms p -> p'. Since we know the H[3, 3] entry in our homography matrix is 1, H has 8 degrees of freedom. We can set up a system of equations to solve for H as follows.
Equation images are from this article.
Again, we already know H[3, 3] is 1.
The equations above are for sets of 4 points, but for
more points we can simply have the system be overdetermined and solve with least squares to
recover H.
Below I warped the images I shot to their alterate angles. This also involves applying a translation to the warped image so that the entire warped image will be shown in the result.
For image rectification, I first used images where I knew some plane was meant to be frontal.
Then I defined corresponding points by creating a rectangle knowing from the min_x, max_x, min_y, and max_y points out of the 4 points I chose in the image. Then I warped the images to this rectangle to make them frontal.
To put the images together into mosaic, given a series of images, I defined correspondencies
between images that were "next" to each other in the mosaic, and used these correspondencies
to warp the images onto one "final canvas".
The images are blended with a mask by finding the overlapping region between the current
"overall" canvas of warped images and the image to be warped in, and for every pixel in that
image, computing the distance to the nearest edge in the canvas (dc) and the distance to the nearest
edge in the image (di), and weighting by dc / (dc + di) and di / (dc + di).
The mosaicing works for an arbitrary # of images. Below are some examples comprised of 3, 3, and 4 images respectively.
It was very interesting that you could use a simple 3x3 matrix to capture the entire image being translated—although I suppose that is why we had the advice to stay in the same POV and simply change the direction of the camera view. Initially when I was shooting the images I didn't realize this and was taking images from different perspectives, this caused some very bad warps.
This part of the project involves using the methods described in the paper "Multi-Image Matching using Multi-Scale Oriented Patches" by Brown et al. to automatically stitch photos together into a mosaic.
First, we use the harris detection algorithm to compute harris corners on an image.
Then using the computed Harris corners, we use the adaptive non maximal suppression (ANMS) algorithm to prune these corners and ensure they are evenly distributed. This involves, for every point, computing the distance to the closest point that is stronger than some robustness parameter times corner strength of our current point. We then return the top n points.
On the ANMS points, we extract features by sampling the square patch around the point. We then compute "nearest neighbors" pair matches and use Lowe's trick to reject the outliers. Finally, we use the RANSAC algorithm to generate a robust homography from the pairs. For my initial RANSAC settings of 1000 iterations, I noticed that as I ran the program many times, the results of the auto-mosaicing were highly variant—so I tried increasing the # of iterations and lowering the epsilon. Here are some results.
For the display in doe, which had 4 images, the auto mosaicing had more trouble.
There are some issues with blending for the mosaics, so I think the method of masking could have been smarter. I think also that blending aside, the manual mosaic positioning looks better. I think maybe the procedure for ANMS, feature matching, and RANSAC could be improved, may have some issues, when I look at the plotted distributions for ANMS points they seem a little too uniformly distributed and do not always seem to stick to the corners.
Doing part B, I think I realized how complicated using unordered images is! I didn't realize that the panorama images were expected to be ordered initially which meant we could use pairs of correspondencies between images and I as attempting to implemenent it as if a feature could be matched with any other feature from the set of images... I definitely was sweating to try to represent that in broadcast operations in numpy.