Part 1

Shoot the Pictures

I shot 3 sets of 2 left-to-right pictures by the beach, at Caffe Strada, and at my desk. I rotated the camera significantly for the beach pictures and zoomed slightly for the desk pictures. My final projection code uses one image as reference and projects another onto that space, so for each set of images to fit in the projected result later in this project, I had to add a significant amount of right and/or bottom padding to the second image in each set.

Recover Homographies

I used ginput and a separate Python script to display both of these images and select distinguishing points. I selected 8 points from each image to ensure an overdetermined system when solving for the homography matrix. Then, I wrote a function to take in (p, p') and solve for the homography matrix mapping p to p'. Given the matrix of (x, y) coordinates, I set up a matrix of coefficients corresponding to the homography matrix entries that would operate on each coordinate, and used SVD to get the smallest nonzero homography matrix to solve for the unknown homography matrix entries. Since this came out to a 9-element vector, the last step was to reshape it into a (3, 3) homography matrix.

Warp the Images

Next, I calculated the homography matrix warping the right image to the left image's POV in order to warp it to the same view. I created a grid of all pixel coordinates in the right image, applied the homography matrix transform to get the corresponding coordinates in the target space, and used a map function to transfer each pixel value over. Although the second image was padded to be much bigger, you can see that similar landmarks correspond to similar coordinates between the 2 images.

Image Rectification

I verified my warping function by selecting 4 rectangular points on the right wall of this street image, and making a copy that snapped the top right and bottom left points to the same plane as the diagonal between the top left and bottom right points, recreating what the selection would look like if the right wall was parallel to the camera. Then, I calculated the homography matrix from the imaginary parallel coordinates to the actual points I selected, and used this to warp the original image to that imaginary point of view. In the result, the right wall successfully shows up parallel to the camera.

I got a similar result for a second image, rectifying my computer screen to be parallel to the camera view.

Blend the images into a mosaic

I tried using a simple alpha blending method, but there was significant ghosting, so I copied the Laplacian pyramid code from Project 2 and was able to make mosaics out of the sets of 2 images as below.

Tell us what you've learned

The most interesting thing I learned was that, as in the case of the rectified street picture and even my image after warping, applying a transform to change the point of view of an image can yield a completely natural-looking image from the shifted POV. I had assumed that certain parts on the extremes of the image would appear squished or distended.

Part 2

First, here are the detected Harris corners for one of my images, and the same corners after Adaptive Non-Maximal Suppression. You can see that after ANMS, the corners are spread more more thinly while still maintaining uniformity relative to the original points.

Original Harris corners:
After ANMS:

Next, here are my color-coded sets of matching features based on filter descriptor extraction + matching. It can be visually verified that the points correspond to the same image features across both images.




Finally, I implemented the RANSAC algorithm for robust homography computation. Here is a comparison of manually stitched collages and collages made with RANSAC.



The most interesting thing I learned in Part 2 was about Lowe's thresholding method (given a point from image A, only count a point from image B as a match if it's more similar than the next-most similar point by a certain ratio) and how effective it is in practice.