Spring 2020
Draw a regular pattern on a box, choose a corner of the box as the global frame origin, and label the world frame coordinates for the interest points. Capture a video with the box at the center of the scene. The video is shown below. The video is captured using an iPhone, and then converted to .mp4 using an online converter https://www.freemake.com/free_video_converter/
To find the keypoints in the first frame, function $plt.ginput$ is used. To propogate keypoints to other Images, opencv MedianFlow tracker is used. First create a tracker for each point, and use a $8\times 8$ patch centered around the keypoints to initialize the tracker. The result of tracking is shown below:
To find the camera projection matrix, the following equation is used.
$\begin{bmatrix} su\\ sv\\ s\end{bmatrix} = \begin{bmatrix}m_{11} & m_{12} & m_{13} & m_{14} \\ m_{21} & m_{22} & m_{23} & m_{24} \\ m_{31} & m_{32} & m_{33} & m_{34} \end{bmatrix}\begin{bmatrix} X \\ Y \\ Z \\ 1\end{bmatrix}$.
Since the projection matrix has 11 degrees of freedom, let $m_{34} = 1$. Reorganize the equations, a form of $Ax = b$ can be obtained. For a single point, by eliminating $s$, it can be rewritten as the following. The the projection matrix $P$ can be solved by using least square function $np.linalg.lstsq$.
$\begin{bmatrix} X & Y & Z & 1 & 0 & 0 & 0 &0 &-uX & -uY & -uZ \\ 0 &0 & 0 & 0 &X & Y &Z & 1 & -vX& -vY & -vZ\end{bmatrix}\begin{bmatrix}m_{11}\\m_{12}\\m_{13}\\m_{14}\\m_{21}\\m_{22}\\m_{23}\\m_{24}\\m_{31}\\m_{32}\\m_{33}\end{bmatrix} = \begin{bmatrix}u \\v\end{bmatrix}$
Project a cube on the top of the box by manually defining eight corners of the cube and applying projection matrix to the corners. The result is shown below.
The insight of gradient domain fushion is that people often care much more about the gradient of an image than the overall intensity. Thus, gradient domain fushion is an optimization problem which minimizes the sqared differences between the gradients. The objective function is
$ \boldsymbol{v} = \text{argmin}_v \sum_{i\in S, j\in N_i \cap S}((v_i - v_j) - (s_i - s_j))^2 + \sum_{i\in s, j \in N_i \cap \neg S}((v_i - t_j) - (s_i - s_j))^2$
where $s$ is the source image, $t$ is the target image and $S$ is the region where the source image will be placed.
To set up the least square problem, $A\boldsymbol{v} = \boldsymbol{b}$, the vector $\boldsymbol{b}$ contains all the known information, $\boldsymbol{v}$ will be the vector to be solved. Use $h, w$ to denote the hight and width of the image.
Different to the toy problem, borders would effect the result of Poisson blending. The main steps are the follows.
To get good result, the color of the source should be similar to the target, as shown in the penguin and eagle blendings. If the colors are different, this blending method might fail, shown in the dragon case.
$\textbf{Mixed Gradient}$: it follows the same procedure as Poisson blending. The only difference is the reference gradient information. Rather than $s_i - s_j$, mixed gradient uses $d_{ij} = s_i - s_j$, if $abs(s_i - s_j) > abs(t_i - t_j)$ as the desired gradients, otherwise $d_{ij} = t_i - t_j$. Then the new optimization problem becomes $ \boldsymbol{v} = \text{argmin}_v \sum_{i\in S, j\in N_i \cap S}((v_i - v_j) - d_{ij})^2 + \sum_{i\in s, j \in N_i \cap \neg S}((v_i - t_j) - d_{ij})^2$.
An example of pasting "go bears" on a wall is shown below. Even though in both blending cases, the characters are forced to change color, the mix gradient blending is much better since it shows the background as the wall.
$\textbf{Color2Gray}$: when converting rgb image to grayscale image by calling rgb2gray, it loses its contrast information. Shown in the following example, it is impossible to read the number 35 from the second image. Gradient domain processing provides one avenue. The gradient information used here is the maximum gradient between saturation channel and value channel (converting rgb to hsv first). The source image is the hsv image, the target is the grayscale by $rgb2gray()$. It is obvious that gradient domain process provides a much better result, shown in the third image.
The main lesson for this project is to use sparse matrix. A matrix of $N$ rows with each row has 5 nonzero entries is slower than a matrix of $4N$ rows with each row has at most 2 nonzero entries. The result of gradient domain fushion is great and the math is simple, but it is error prone to code it up. It took me hours to debug the first poisson blending because I passed a wrong source image into the function.