spring 2020
I used my smartphone to shoot 2 images of the backyard of my house. Then, those images (HEIC type) were converted to PNG type using online converter https://convertio.co. The images are shown below.
The transformation $H$ of a homography is caluclated by $p' = Hp$, where H is a $3\times 3$ matrix with 8 degrees of freedom, $p$ and $p'$ are the pairs of correspondence points from two images. The correspondence points are manually labelled by the user input using the bare-bone $ginput$ function. Since I have more than 4 pairs of correspondence points, least squares is used.
Assume $H = \begin{bmatrix}a &b &c \\ d &e &f\\ g &h &1\end{bmatrix}$, then $\begin{bmatrix}\omega x' \\ \omega y' \\ \omega \end{bmatrix} = \begin{bmatrix}a &b &c \\ d &e &f\\ g &h &1\end{bmatrix} \begin{bmatrix}x \\ y \\ 1\end{bmatrix}$ can be rewritten as:
$\begin{bmatrix} x &y &1 &0 &0 &0 &-xx' &-yx' \\ 0 &0 &0 &x &y &1 &-xy' &-yy' \end{bmatrix}\begin{bmatrix}a\\ b\\ c\\ d\\ e\\ f\\ g\\ h \end{bmatrix} = \begin{bmatrix}x' \\ y' \end{bmatrix}$ (by eliminating $\omega$), which is in the form of $\textbf{A}\vec{x} = \vec{b}$. Use $\textit{np.linalg.lstsq(A, b)}$ to solve for $\vec{x}$.
Inversely warp images using homography whose parameters are calculated in the previous part, using similar approach as in project 3. The example of warping image 0 to the projection plane same as image 1 is shown below.
For image rectification, choose a geometry whose vertices can be calculated by hand, i.e., a rectangle. To solve for the homography, it requires that the chosen geometry has more than three vertices. Two examples of image rectification are shown below, which are a FedEx box and a painting respectively.
To blend two images into a mosaic, I first found the intersection of those two images, then set the final values of the intersection to be the average of values. The resulting mosaic backyard image is shown below.
Two more indoor examples are shown below. For the cabinet example, there is a clear edge artifact. One of the reason might be that the camera is too close to the cabinet, so that a very small translation of the camera center has a large impact.
There were more harris corners than I wanted, so I used Adaptive Non-Maximal Suppresion to reduce the number of harris corners with $nip = 400, crobust=0.9$. The same example with the suppressed interest points is plotted below.
To extract a feature descriptor, I first sampled a $40\times 40 \times 3$ patch of the image centered at each interest points. This patch was then resized to $8 \times 8 \times 3$ and averaged along $axis = 2$ to obtain a $8 \times 8$ patch. This patch was finally normalized and transformed into a (64, ) vector by applying Haar wavelet transformation.
To match the features between two images $A$ and $B$, the distances between any pair of features were calculated using the provided function $dist2()$. Then for each interest point in $A$, the closest match $e_{1NN}$ and second closest match $e_{2NN}$ were computed. To determine whether a match was good or not, I checked if $e_{1NN}/e_{2NN} < threshold = 0.3$. The result of feature space outliner rejection for the same example is plotted below.
Using Ransac to compute homography is more robust to outliners. The parameters used for Ransac are:
- iteration = 100
- threshold = 2 pixels
For the same pair of images
the result by manually picking interest points is $\begin{bmatrix} 6.95896556e-01 & -3.72585015e-02 & 2.85477101e+02\\ -8.64154929e-02 & 8.70324604e-01 & 5.59966270e+01\\ -2.96725347e-04 & -1.74124247e-05 & 1.0 \end{bmatrix}$,
whereas the result by Ransac is $\begin{bmatrix} 6.90918316e-01 &-4.05284704e-02 &2.85660002e+02\\ -8.72975286e-02 &8.64013442e-01 & 5.71611442e+01\\-2.97785063e-04 & -2.53211487e-05 & 1.0\end{bmatrix}$. The results were very similar.
Three examples of autostitching results were shown below. The backyard mosaic and door mosaic are better than than the results from manually selected correspondence. The cabinet mosaic is similar the the result from manually selected correspondence and the edge artifact still exists, probably due to the fact that camera center translation has a stronger impact on closer scene.
The result submitted for the checkpoint seems awkard and does not get a plausible mosaic for the door images and cabinet images. The reason was that I forgot to normalize the third component of a point to 1 after applying homography.