CS194-26 Final Project

Kecheng Chen, kecheng_chen@berkeley.edu

Xidong Wu, xidong_wu@berkeley.edu

Poor Man's Augmented Reality

Introduction

Capture a video and insert a synthetic object into the scene.

Setup and Keypoints with known 3D world coordinates

We build the box with dimension of 7.5 x 4 x 3.5 inch as the setup. The box and its video are shown below. We build the coordinate on it and pick 42 points on it.

Imgur Imgur

Propogating Keypoints to other Images in the Video

Hacky corner detector has been tried, but the effect is not good, so we tried to use the off the shelf tracker. MedianFlow tracker from OpenCv library is tried. 8x8 patch centered around the marked point is used to initialize the tracker. Following video shows a single point tracking.

Matlab build-in function using Kanade-Lucas-Tomasi (KLT) algorithm is also utilized. Parameters are number of pyramid levels, forward-backward error threshold, size of neighborhood and maximum number of search iterations.

Calibrating the Camera

Use least squares to fit the camera projection matrix to project the 4 dimensional real world coordinates to 3 dimensional image coordinates. Following shows the corresponding formula and reprojection result.

Imgur
Imgur

AR Result

Render a Cube for each frame.

    Tour into the Picture: Single View Modeling

    Introduction

    The goal of this project is to create a simple, planar 3D scene from a single photograph.

    result

    (1) Fit the box volume

    Divide the image into five parts: ceiling, right wall, left wall, rear wall and floor. In this process, only three points are needed, including up-left and down-right vertices, and vanishing point.

    Imgur

    Imgur
    Imgur

    Imgur
    Imgur

    Imgur

    (2) 2D to 3D conversion

    In this part, we need to define 3D geometry corresponding to the above selected planes. The location of vanishing point corresponds to the camera pos. Relative height and width dimensions of box can be determined using top versus side ratio. Focal length is needed to determine the depth. In this part, focal length is arbitrarily assigned. Then do the homography warping. After this, we can move and rotate the camera and look at the scene from different viewpoints.

    Imgur

    ImgurImgur

    Imgur

    Imgur

    Imgur

    Imgur

    Bells & Whistles (Foreground Objects)

    Based on the original image, we can get Mask to isolate desired foreground images and background with objects removed. More detailedly, when the foreground images are removed, corresponding region is filled using inward interpolation. Define the plane where the foreground object are located. Then follow the process as before. (“alpha” function is used to show the transparency).

    Imgur

    ImgurImgur

    Reimplement: A Neural Algorithm of Artistic Style

    Introduction

    In the third project, we will reimplement a paper about art style transfer with CNN network developed by Leon A. Gatys, Alexander S. Ecker and Matthias Bethge. Neural-Style, or Neural-Transfer. This algorithm can transfer the art style from the style image to content image. Different from the neural network learned before, this method only need three inputs (a content image, a style image, and an output image) and finally the content of output image will be that of content image and its style will looks like that of style image.

    Network

    As described in the paper, the network is built on the basis of the VGG-Network and we use the feature space provided by the 16 convolutional and 5 pooling layers of the 19 layer VGGNetwork. Fully connected layers is dropped. The architecture of VGG is shown below:

    Imgur

    VGG Architecture

    Therefore, I use the feature space provided by the 16 convolutional and replace the pool with average pool. I constructed ContentLoss and StyleLoss exactly following the paper (including gram matrix). For convenience, I rebuilt the model and add the loss alyer in the model. Meanwhile, I only choose the layer I need in the calculation of loss. The optimizer is LBFGS and operate gradient descent for 500 epochs. To run the model, I choose content layer, style layer, weight_style = 10000, weight_content = 1, w_l_style as parameters. The w_l_style is Weighting factors of the contribution of each layer I choose is [0.75, 0.5, 0.2, 0.2, 0.2] or [0.2, 0.2, 0.2, 0.2, 0.2]. Different from the paper, I realized that this parameter has little effect on the result. Large ratio of weight_style and weight_content will make the style transfer more obvious. New model architecture is shown below.

    Imgur

    My Model Architecture

    Result

    Firstly, I transfer Neckarfront to different styles in the paper, including (1) Shipwreck of the Minotaur by J.M.W. Turner, 1805. (2) The Starry Night by Vincent van Gogh, 1889. (3) D Der Schrei by Edvard Munch, 1893. (4) E Femme nue assise by Pablo Picasso, 1910.

    Content Image in the paper

    Initially, I choose the layers suggested in the paper, but it does not work very well. I change the layers to the first five laters as the hints in the Piazza. It has a better performance. So, I apply it to all others.

    Bad results with layers in the papers

    Results with first five layers

    All resutls are listed:









    Although I think my model has a good performance, compared with results in the paper, my model resutls express more color or texture transfer. It might be because I need to adjust the paramters and use the random input, instead of the average of style and content images, to improve the style transfer. Different choice of layers might be helpful.

    Secondly, I choose one more style to test my model.

    Additional Style Image

    Additional Content Image

    Additional Image with additional style

    Additional Image with one style

    Additional Image with one style

    Finally, I choose more images to test my models. These style transfer mothod easily change the style of poster and is very useful to image design

















    Unfortunately, there are some failure case. The below is one example. It just change its color and cannot see exact style. I try to adjust the paramters and choose different layers. To be honest, I do not know what leads to this failure and to some extend, I do not know why different layers have different performance.


    Fialure case

    Conclusion

    I really like the study in the 194-26, and the projects are very interesting. Most importantly, by project, I learn many practical skills and implement many works I cannot imagine. Take the image style transfer, I learn a lot methods to build the CNN, which is very popular now and I am excited to learn it. In addition, when I apply this algorithm to other images, I realize the importance of this method and realize the magic of computer vision.