Project 4 [Auto]Stitching Photo Mosaics

2021 Fall CS 294-026 Xinwei Zhuang

Part 1: Image Warping and Mosaicing

Step 1: Shoot and digitize pictures

Picture is my bedroom using sony alpha 5000. All parameter is identical, and the camera only rotates but does not move.

Step 2: Recover homographies

To recover the homographies, a set of corresponding points are manually selected, the selected points are as follows.

Since they are not accurate to a pixel level, I then need to fine-tune the points to the real pixels. I used normalized-correlation matching from Project 1 to find the accurate points' coordinates. The adjusted coordinates are plot below.

The adjusted points are as follows.
Then a Homography matrix is calculated by p'=Hp. There are 8 unkown variables. With 4 points (8 equations), the system will provide a result. In this case, 8 points will provide a overdetermined result. To get the warped pixels, the equation for transforamtion is: $$ \begin{bmatrix} wx' \\\ wy' \\\ w \end{bmatrix} = \begin{bmatrix} a & b & c \\\ d & e & f \\\ g & h & i \end{bmatrix} \begin{bmatrix} x \\\ y \\\ 1 \end{bmatrix} $$ Since one image only shares a single scale factor $w$, we can expand the equation above into the following format: $$ \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1'\cdot x_1 & -x_1' \cdot y_1 \\\ 0 & 0 & 0 & x_1 & y_1 & 1 & -y_1'\cdot x_1 & -y_1' \cdot y_1 \\\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2'\cdot x_2 & -x_2' \cdot y_2 \\\ 0 & 0 & 0 & x_2 & y_2 & 1 & -y_2'\cdot x_2 & -y_2' \cdot y_2 \\\ \vdots\\\ x_n & y_n & 1 & 0 & 0 & 0 & -x_n'\cdot x_n & -x_n' \cdot y_2 \\\ 0 & 0 & 0 & x_n & y_n & 1 & -y_n'\cdot x_n & -y_n' \cdot y_2 \end{bmatrix} \begin{bmatrix} h_{11} \\\ h_{12} \\\ h_{13} \\\ h_{21} \\\ h_{22} \\\ h_{23} \\\ h_{31} \\\ h_{32} \end{bmatrix} = \begin{bmatrix} x_1' \\\ y_1' \\\ x_2' \\\ y_2' \\\ \vdots \\\ x_n' \\\ y_n' \end{bmatrix} $$ To solve the homographies, more than 4 points are needed, so we have at least 8 equations to solve Homography matrix. In this project, 8 points are used for each pair of photos. Thus 16 equations is used to solve a homography matrix. The above equation is then translated and a constraint $|H|=1$ is added to avoid H being all zeros. $$ \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1'\cdot x_1 & -x_1' \cdot y_1 & -x_1'\\\ 0 & 0 & 0 & x_1 & y_1 & 1 & -y_1'\cdot x_1 & -y_1' \cdot y_1 & -y_1'\\\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2'\cdot x_2 & -x_2' \cdot y_2 &-x_2'\\\ 0 & 0 & 0 & x_2 & y_2 & 1 & -y_2'\cdot x_2 & -y_2' \cdot y_2 &-y_2'\\\ \vdots\\\ x_n & y_n & 1 & 0 & 0 & 0 & -x_n'\cdot x_n & -x_n' \cdot y_2 & -x_n'\\\ 0 & 0 & 0 & x_n & y_n & 1 & -y_n'\cdot x_n & -y_n' \cdot y_2 & -y_n'\\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} h_{11} \\\ h_{12} \\\ h_{13} \\\ h_{21} \\\ h_{22} \\\ h_{23} \\\ h_{31} \\\ h_{32} \\\ h_{33} \end{bmatrix} = \begin{bmatrix} 0 \\\ 0 \\\ 0 \\\ 0 \\\ \vdots \\\ 0 \\\ 0 \\\ 1 \end{bmatrix} $$ Finally, SVD is used and the last singular vector V is selected as the solution of H.

Step 3: Warp the images (Image Rectification)

An inverse warping is used to find corresponding points. Interp2D is used for antialias. Results of warping image 1 into the plane of image 2 is shown below.

With homography matrix, I can morph any image with planary identity to its non-distorted view. Several examples are shown below.

Original photo

Rectified image

Original photo

Rectified image

Original photo

Rectified image

Step 4: Blend images into a mosaic

To blend images, first step is to find the canvas size, then add pixels from the original images to target position with a alpha mask. The results on 3 sets of images are shown below. (Sorry for the boing results!)

Result image

Though it looks not bad, Notice that the colour is not same between two images. One is cooler and one is warmer. To provide a more smooth transicent, a weighted averaging mask is performed. The resulting blending after applying a Laplacian pyramid blending is shown below. The edge is nearly invisible now. :)

Result image

Result image (for these photos I tried to morphy them with different target images. Both are not satisfying, changing them to cylinder projection will probably behave better.)

What I've learnt

The homography matrix is dependent on the quality of the feature points. Firstly it took me too long to find a satisfying morphing, but I realised that this is due to the inaccurate feature points. I turned to Matlab for auto fine tuning the feature points and the transformation gets better.

For image blending, there are a lot of tricks that can make the blending better. I just used one of the naive one, a weighted averaging to feather the edges. If more clarity is required, a Laplacian pyramid blending can also be used for ghosting of high-frequency terms.

The projected image can look badly artificial if the projection method is not chosen carefully. For rotated photos, the morphed image looks better when project onto a cylinder screen.

Part 2: Feature Matching for Autostitching

Step 1: Detecting corner features in an image

The features of the photos are detected using Harris corner (Brown and Winder, 2005). The code implementation is provided by Prof. Efros. The results are shown below. Before I implement the Adaptive Non-Maximal Suppression, I used a threshold to filter all the corners with a higher magnitude. However, the threshold method provides a highly concentreated corner points' coordiantes, and not evenly distributed cross the image.

Adaptive Non-Maximal Suppression(ANMS)

To resolve the concentration issue, ANMS is used to find evenly spread-out corners. The first entry in the corner list is the global maximum, which is not suppressed at any radius. As the suppression radius decreases from infinity, interest points are added to the list until the desired number of interest points is obtained (Brown et al.).

Step 2: Extracting Feature Descriptor

For each corner point, a 8x8 pixel patch is sampled from a 40x40 window. After sampling, the descriptor vector is normalised so that the mean is 0 and the standard deviation is 1. This makes the features invariant to affine changes in intensity (Brown and Winder, 2005). Wavelet transform is not implemented. Some selection of the 40x40 patches, the downsampled 8x8 patches, and the normalised descriptors (third row) are shown below.

40x40 patches (first row) and the downsampled 8x8 patches (secondd row), and normalised descriptor (third row)

Step 3: Matching feature descriptors

Lowe of thresholding on the ratio between the first and the second nearest neighbors is used for matching image descriptors. The Lowe's threshold is $$ \dfrac{e_{1-NN}}{e_{2-NN}} $$ where $e_{1−NN}$ denotes the error for the best match (first nearest neighbour) and $e_{2−NN}$ denotes the error for the second best match (second nearest neighbour) (Brown and Winder, 2005). According to Figure 6b in Brown and Winder's paper, the threshold is set to be 0.3 to avoid large amount of errors. The filtered matches are plotted on the image below.

Step 4: Robust method (RANSAC) to compute homography

4-point RANSAC is used to compute a robust homography estimate. The procedure is as follows:

Randomly select 4 feature points
Computer exact homography H
Computer inliers where $dist(p_i',Hp_i)< \epsilon $
Keep largest set of inliers
Re-compute least-squares H estimate on all of the inliers

Step 5: Image mosaic

The image mosaic using RANSAC and auto-detected points compared with manually selected points are shown below. Though it is nearly negligible, but the morphed image (left) is distorted more compared with manual corner selection result (Notice the lower left black corner).

Also tried on some new images, without manually picking points, the procedure time reduces largely.

Result image

Bells & Whistles

Cylindrical projection
For the bedroom photo mosiac. I project the photos back on a cylinder screen to see whether it will provide a less artificial looking image.

What I've learnt

The coolest thing that I can apply to other domain: how to find the best match. Not only find the first match, but also find the second best match, if they have a large discrapency, then drop both.
Automatic feature selection can do beyond human ability. But the results for simple tasks such as sticking two bedroom photo together doesn't improve much. But this technique can be used for more exciting image stiching such as galaxy.

Reference

For homography matrix:
https://math.stackexchange.com/questions/494238/how-to-compute-homography-matrix-h-from-corresponding-points-2d-2d-planar-homog

For Harris corner feature detection
Brown, M., Szeliski, R., & Winder, S. (2005, June). Multi-image matching using multi-scale oriented patches. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) (Vol. 1, pp. 510-517). IEEE.