CS 294-26: Intro to Computer Vision and Computational Photography, Fall 2022

Project 4: [Auto]stitching and Photo Mosaics

Katherine Song (cs-194-26-acj)



Overview

Part 1: Image Warping and Mosaicing

Shooting the Pictures

For shooting my pictures, I used a DSLR in manual mode so I could fix shutter speed and aperture. For mosaic-making, I took a couple of pictures of my lab, a couple of pictures of the hallway outside my lab, and a couple of pictures of a little lounge area inside my lab. For each set, I took the pictures from the same point of view but with different view directions using a tripod:

Lab view #1
Lab view #2
Hallway view #1
Hallway view #2
Lounge view #1
Lounge view #2
I also shot a couple of images of planar objects from different angles to specifically illustrate image rectification (rectification could also be done with some features of the above images, but I shot separate images just because I knew the exact dimensions of the below planar objects)
Textbook
Hallway with poster

Recovering Homographies

To recover homographies, I wrote the function computeH(im1_pts, im2_pts) to compute the 3x3 homography matrix H. First, I marked 14 correspondences in 2 images by hand using the point-picking code I wrote in the last project:
Because I wanted to use more than 4 correspondences for more robust alignment, I needed to use a least-squares approach to solve the resulting overdetermined system. We start with the basic equation, Hp = p'. For one set of correspondences: $$\begin{bmatrix}h_{11} & h_{12} & h_{13}\\h_{21} & h_{22} & h_{23}\\h_{31} & h_{32} & h_{33}\end{bmatrix} \begin{bmatrix}p_{x}\\p_{y}\\1\end{bmatrix} = \begin{bmatrix}p'_{x}\\p'_{y}\\p'_{z}\end{bmatrix}$$ This can be written into a series of equations: $$h_{11}p_{x} + h_{12}p_{y} + h_{13} = p'_{x}$$ $$h_{21}p_{x} + h_{22}p_{y} + h_{23} = p'_{y}$$ $$h_{31}p_{x} + h_{32}p_{y} + h_{33} = p'_{z}$$ We can set p'z = 1 and thus multiply the left side of the third equation to the right sides of the first 2 equations: $$h_{11}p_{x} + h_{12}p_{y} + h_{13} = h_{31}p'_{x}p_{x} + h_{32}p'_{x}p_{y} + h_{33}p'_{x}$$ $$h_{21}p_{x} + h_{22}p_{y} + h_{23} = h_{31}p'_{y}p_{x} + h_{32}p'_{y}p_{y} + h_{33}p'_{y}$$ Rearranging and writing in matrix form and also setting h33 to 1, we have: $$\begin{bmatrix}p_{x} & p_{y} & 1 & 0 & 0 & 0 & -p'_{x}p_{x} & -p'_{x}p_{y}\\0 & 0 & 0 & p_{x} & p_{y} & 1 & -p'_{y}p_{x} & -p'_{y}p_{y}\end{bmatrix} \begin{bmatrix}h_{11}\\h_{12}\\h_{13}\\h_{21}\\h_{22}\\h_{23}\\h_{31}\\h_{32}\end{bmatrix} = \begin{bmatrix}p'_{x}\\p'_{y}\end{bmatrix}$$ For N sets of correspondences, this becomes: $$\begin{bmatrix}p_{x1} & p_{y1} & 1 & 0 & 0 & 0 & -p'_{x1}p_{x1} & -p'_{x1}p_{y1}\\0 & 0 & 0 & p_{x1} & p_{y1} & 1 & -p'_{y1}p_{x1} & -p'_{y1}p_{y1}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ p_{xN} & p_{yN} & 1 & 0 & 0 & 0 & -p'_{xN}p_{xN} & -p'_{xN}p_{yN}\\0 & 0 & 0 & p_{xN} & p_{yN} & 1 & -p'_{yN}p_{xN} & -p'_{yN}p_{yN}\end{bmatrix} \begin{bmatrix}h_{11}\\h_{12}\\h_{13}\\h_{21}\\h_{22}\\h_{23}\\h_{31}\\h_{32}\end{bmatrix} = \begin{bmatrix}p'_{x1}\\p'_{y1}\\ \vdots \\ p'_{xN}\\p'_{yN}\end{bmatrix}$$ or, renaming each matrix: $$ Ah = b$$ For the least-squares problem, we solve for $$min||Ah-b||^{2}$$ We expand ||Ah-b||2: $$ h^{T}(A^{T}A)h - 2h^{T}(A^{T}b) + ||b||^2$$ Setting the derivative to 0 to find the minimum, we get: $$(A^{T}A)h - A^{T}b = 0$$ $$h = (A^{T}A)^{-1}A^{T}b$$ h is then reshaped into a 3x3 matrix to finally obtain H!

Warping the Images

Once the homography matrix H is obtained, we can apply it to images. In theory, this is fairly simple. The corners of the warped image are found by multiplying H by the coordinates of the original image's corners. Using the minimum and maximum (x,y) extents of the warped corners, we create a 2D mesh grid that contains the coordinates of every pixel within the bounding box for the new image. Then, without writing any loops, we determine what pixels each of those correspond to by applying the inverse warping matrix H-1 to the mesh grid of coordinates. The resulting computed pixels are not all integers and do not all actually lie within the original image, so an interpolation function is needed to find the correct brightness value to assign to each pixel; I chose to use scipy.ndimage's map_coordinates function, which takes data (an image), row+column locations to interpolate, and parameters specifying what to assign values that do not fall within the data's bounds. To make it easy to make an alpha channel, I set out-of-bounds values to NaN (so the alpha mask could just be 1 where the image isn't NaN and 0 everywhere else). Below are the results of warping the left image from the "Shooting the Pictures" section above to match the perspective of the right image:
Lab view #1, warped
Hallway view #1, warped
Lounge view #1, warped
In practice, this step required more painful debugging than I thought, as I ran into a few bugs from overlooked details, such as not normalizing by the resulting 3rd homogeneous coordinate, not properly zeroing out NaN pixel values, and not supplying the interpolation function with the properly formatted input values.

Image Rectification

With the previous step done, rectification becomes quite straightforward. For rectification, I took a few images containing planar elements. I selected 4 corners of objects I knew the dimensions of (roughly) to be the correspondences. Their coordinates were saved in one correspondence file, and in the other correspondence file, I simply put points corresponding to the aspect ratio of the object in real life (e.g. a 8.5"x11" piece of paper could have coordinates (0,0), (0, 17), (22, 0), and (22, 17)). I then applied the warping function described above. Below are 2 examples of rectification. For the first, I used the textbook corners as one set of alignment points and coordinates corresponding to a view of the book straight-on (i.e. (0,0), (900, 0), (0, 1100), and (900, 1100)). For the second, I used the corners of one of the posters as one set of alignment points and coordinates corresponding to a view of that poster straight-on.
Textbook
Textbook, rectified
Hallway poster
Hallway poster, rectified
In the second above, the poster that I used for my correspondences looks nice and undistorted, but as often happens, towards the edges of the rectified image and also in parts of the image that are not on the poster's plane, we see fairly unpleasant distortion effects.

Blending the Images into a Mosaic

Once we have properly warped images, we can combine them into a mosaic by padding them appropriately (by comparing the coordinates of the warped corners to those of the original corners) and then blending them together. I used 2-band blending as described in lecture and Brown & Lowe 2003. For this, I had the most success when the mask I used was slightly smaller than the image; otherwise, I had to use very large Gaussians to blur out the image edges, and the program would take forever to run. For my mask, I started with the alpha channel of one image and decreased the size of the white area by a set amount (50 pixels for my images). 3 examples of resulting mosaics (from the warped images in the previous section) are below:
Lab mosaic
Hallway mosaic
Lounge mosaic
For the third, my tripod legs shifted a bit, so I think my images weren't taken perfectly from the same viewpoint. Maybe that's why the alignment isn't quite as seamless.

Takeaways

One thing I learned here is that forming correspondences is a really tedious task that I am thankful to automate in part 2 of this project. When I was sloppy with my point-picking, the resulting warped images would not result in very convincing panoramas. Also, I realized that at least with the techniques we've covered in class thus far, you can't really hope to make a good mosaic without following the "rules" -- when I didn't actually capture photos from the same point of view because I was sloppy with the tripod, the images just wouldn't align quite right no matter how many alignment points I picked.

Part 2: Feature Matching for Autostitching

To be completed by Oct 24...