For this part of the project, I chose to use my DSLR to take the pictures (Canon EOS80D) so that I could shoot in manual mode and keep exposure, shutter speed and aperture the same (so that the photos are as consistent as possible). I put the camera on a tripod so I could turn it without translating its position and also went with a wide-angle lens as this is the typical use case for a panorama. Finally, I chose to take photos of the exterior of the Unit 2 dorm buildings in Berkeley because the windows would be easily identifiable correspondence points between the different images.
Note that all of the following images have been compressed for this website. The original images are 6000x4000 pixels each.
We know that since all the images were taken with the same center of projection and the camera was just rotated, that the transformation between each pair of images is a homography following p' = Hp, with H being a 3x3 matrix with 8 degrees of freedom (the last entry is a scaling factor = 1). If we want to recover the values of H, we can set up linear equations as follows:
Now that we know that each (p, p') correspondence will provide 2 rows to the A matrix above, it is clear to see
that a minimum of 4 correspondences are needed. This is because this will give 8 equations to solve for 8 unknowns.
However, with only 4 (p, p') pairs, any noise in the images can lead to errors in the homography matrix. Therefore,
more correspondences were provided (E.g 8) and the A matrix and b vector were set up as above but then Least Squares
was used to find the optimal solution for the h vector.
Below, the defined correspondences can be seen in all 3 images (they were identified as 4 corners of 2 windows present
in all the images) and these were collected using ginput. Then, the parameters of the homography were recovered using
the system of equations explained above.
To warp the images, I first defined a find_bounds function that takes in an image and a Homography matrix and then
does a matrix multiplication between H and the 4 corners of the original image. It then divides each (wx,wy, w) coordinate in the result
by the 3rd value (i.e. the w) to bring it back into 2D. Between these 4 new coordinates, I found the y_max and the x_max value
and initialized the dimensions of the warped image to x_max, y_max.
To fill in this new warped image, I used inverse warping - Effectively taking the indices of the new image, multiplying
them by the inverse of H to find the coordinates in the original image, and then using cv2.remap to sample the pixel
values from the original image and bring them to the new, warped image. During this process, I had to make sure
all (x, y) coords before being multiplied by H^(-1) had a 3rd dimension 1 value (for the homogenous coordinate), and that
after multiplying by H^(-1), I was rescaling by w each time.
Note: For both the boundary calculation and the inverse warping, an H matrix had to be used and this was computed using the
function explained in the previous section.
Below, you can see the results of warping the middle image into the shape of the left one, and warping the right image
into the shape of the middle one:
Once we have the warp algorithm working from the previous part, rectifying images becomes fairly similar. For this part, I took photos of square objects from an angle and then set the correspondence points as the 4 corners of the squares. Then I 'rectified' them by defining the 4 points of a perfect square and then warping the angled images to this square. Results can be seen below and generally show that the warping algorithm is working successfully!
To blend the images into a mosaic, I defined a function that takes in 2 images (one of which has presumably been
warped into the other's projection as explained in part 2). This function first finds the size of the mosaic by
finding the x-max and y-max between both images, and then creates two blank images with this new shape. It fills
the first blank image with input_image1 and the second blank image with input_image2. It then finds the overlapping
coordinates between these two images by checking the indices where both images are non-zero. Since we're doing a
horizontal blend here, I found the minimum and maximum x coordinates of the overlapping area, and then defined a
mask - a numpy linspace over this range (ranges from 1 to 0), which essentially acts as an alpha transition. The first image's
overlapping area is multiplied by the mask and the second image's overlapping area is multipled by (1-mask). The
two images are then added together to create the final mosaic. Three examples of this algorithm can be seen below:
You can see with the outdoor scene of Unit 2, there is some ghosting issues because the trees and the wires move
around a little due to the wind and even with well defined correspondences, this will lead to some issues. On the
other hand, with the desk and the indoor house scene, these are a lot more stationary, so the resulting mosaic is a
lot better looking!
The most important thing I learned from this project was the new issues to keep in mind as we move from
affine triangle warps (in the last project) to projective rectangular transformations in this project. I initially
thought that the process would be quite similar and although the idea of defining correspondences and inverse warping
is still present, there were new issues I had to deal with: What if after warping, the new boundaries go into negative
coordinates? How can we rearrange the system of equations so that we can solve them even without explicitly knowing w
in the p' point? Therefore, I was able to learn the differences between projective and affine warps more clearly
by seeing the impact in the transformations themselves.
The coolest part of this project was that I've always wondered how document-scanning apps always work so well - They
allow the user to take a photograph of a document at a slight angle, identify the 4 corners of the document, and then
return an image that looks like it was scanned by an actual scanner. After working on the 'Rectifying Images' part of this
project, I now realize how the user setting the 4 points of the document is basically us identifying the correspondences
so they can warp the image into a rectangular!