CS194 Final Project

Spring 2020

Part One: Poor Man's Augmented Reality

Taking a Video

Draw a regular pattern on a box, choose a corner of the box as the global frame origin, and label the world frame coordinates for the interest points. Capture a video with the box at the center of the scene. The video is shown below. The video is captured using an iPhone, and then converted to .mp4 using an online converter https://www.freemake.com/free_video_converter/

Finding Keypoints

To find the keypoints in the first frame, function $plt.ginput$ is used. To propogate keypoints to other Images, opencv MedianFlow tracker is used. First create a tracker for each point, and use a $8\times 8$ patch centered around the keypoints to initialize the tracker. The result of tracking is shown below:

Calibrating the Camera

To find the camera projection matrix, the following equation is used.

$\begin{bmatrix} su\\ sv\\ s\end{bmatrix} = \begin{bmatrix}m_{11} & m_{12} & m_{13} & m_{14} \\ m_{21} & m_{22} & m_{23} & m_{24} \\ m_{31} & m_{32} & m_{33} & m_{34} \end{bmatrix}\begin{bmatrix} X \\ Y \\ Z \\ 1\end{bmatrix}$.

Since the projection matrix has 11 degrees of freedom, let $m_{34} = 1$. Reorganize the equations, a form of $Ax = b$ can be obtained. For a single point, by eliminating $s$, it can be rewritten as the following. The the projection matrix $P$ can be solved by using least square function $np.linalg.lstsq$.

$\begin{bmatrix} X & Y & Z & 1 & 0 & 0 & 0 &0 &-uX & -uY & -uZ \\ 0 &0 & 0 & 0 &X & Y &Z & 1 & -vX& -vY & -vZ\end{bmatrix}\begin{bmatrix}m_{11}\\m_{12}\\m_{13}\\m_{14}\\m_{21}\\m_{22}\\m_{23}\\m_{24}\\m_{31}\\m_{32}\\m_{33}\end{bmatrix} = \begin{bmatrix}u \\v\end{bmatrix}$

Projecting a Cube in the Scene

Project a cube on the top of the box by manually defining eight corners of the cube and applying projection matrix to the corners. The result is shown below.

Part Two: Gradient Domain Fushion

The insight of gradient domain fushion is that people often care much more about the gradient of an image than the overall intensity. Thus, gradient domain fushion is an optimization problem which minimizes the sqared differences between the gradients. The objective function is

$ \boldsymbol{v} = \text{argmin}_v \sum_{i\in S, j\in N_i \cap S}((v_i - v_j) - (s_i - s_j))^2 + \sum_{i\in s, j \in N_i \cap \neg S}((v_i - t_j) - (s_i - s_j))^2$

where $s$ is the source image, $t$ is the target image and $S$ is the region where the source image will be placed.

Toy Problem

To set up the least square problem, $A\boldsymbol{v} = \boldsymbol{b}$, the vector $\boldsymbol{b}$ contains all the known information, $\boldsymbol{v}$ will be the vector to be solved. Use $h, w$ to denote the hight and width of the image.

  • $\boldsymbol{b}$ is the stack of all $x$ gradient, $y$ gradient and the corner value $s[0, 0]$ (zero indexed in python). Ignoring indices outside of the image, there are $h * (w - 1)$ $x$ gradients and $(h - 1) * w$ $y$ gradients, so $\boldsymbol{b} \in \mathbb{R}^{h * (w - 1) + (h - 1) * w + 1}$.
  • $\boldsymbol{v}$ is the image we want to optimize, so $\boldsymbol{v} \in \mathbb{R}^{h \times w}$.
  • $A$ is the linear transformation between $\boldsymbol{v}$ and $\boldsymbol{b}$, so $A \in \mathbb{R}^{(h * (w - 1) + (h - 1) * w + 1) \times (h * w))}$ In this simple toy case, if $\boldsymbol{v} = \boldsymbol{s}$, then the error would be minimized to 0. Hence, we should be able to reconstruct the image almost indentically. The result is shown below.

Poisson Blending

Different to the toy problem, borders would effect the result of Poisson blending. The main steps are the follows.

  • create a mask of the source image using photoshop.
  • find the position (pixel coordinate in target image) to place the source image manually.
  • generate $A$ and $\boldsymbol{b}$. If $j$ is inside the region $S$, then $v_i - v_j = s_i - s_j$, the entries in $A$ are 1 and -1 for the corresponding variables. If $j$ is outside the region $S$, then $v_j = s_i - s_j + t_j$, the entry of $A$ is 1 and the entry of $\boldsymbol{b}$ is $s_i-s_j+t_j$. $A$ can be shared for three color channels.
  • solve the optimization problem using $scipy.sparse.linalg.lsqr()$.
  • blend images together.

To get good result, the color of the source should be similar to the target, as shown in the penguin and eagle blendings. If the colors are different, this blending method might fail, shown in the dragon case.

Bells and Whistles

$\textbf{Mixed Gradient}$: it follows the same procedure as Poisson blending. The only difference is the reference gradient information. Rather than $s_i - s_j$, mixed gradient uses $d_{ij} = s_i - s_j$, if $abs(s_i - s_j) > abs(t_i - t_j)$ as the desired gradients, otherwise $d_{ij} = t_i - t_j$. Then the new optimization problem becomes $ \boldsymbol{v} = \text{argmin}_v \sum_{i\in S, j\in N_i \cap S}((v_i - v_j) - d_{ij})^2 + \sum_{i\in s, j \in N_i \cap \neg S}((v_i - t_j) - d_{ij})^2$.

An example of pasting "go bears" on a wall is shown below. Even though in both blending cases, the characters are forced to change color, the mix gradient blending is much better since it shows the background as the wall.

$\textbf{Color2Gray}$: when converting rgb image to grayscale image by calling rgb2gray, it loses its contrast information. Shown in the following example, it is impossible to read the number 35 from the second image. Gradient domain processing provides one avenue. The gradient information used here is the maximum gradient between saturation channel and value channel (converting rgb to hsv first). The source image is the hsv image, the target is the grayscale by $rgb2gray()$. It is obvious that gradient domain process provides a much better result, shown in the third image.

What I learned

The main lesson for this project is to use sparse matrix. A matrix of $N$ rows with each row has 5 nonzero entries is slower than a matrix of $4N$ rows with each row has at most 2 nonzero entries. The result of gradient domain fushion is great and the math is simple, but it is error prone to code it up. It took me hours to debug the first poisson blending because I passed a wrong source image into the function.