Capture a video and insert a synthetic object into the scene.
Setup and Keypoints with known 3D world coordinates
We build the box with dimension of 7.5 x 4 x 3.5 inch as the setup. The box and its video are shown below. We build the coordinate on it and pick 42 points on it.
Propogating Keypoints to other Images in the Video
Hacky corner detector has been tried, but the effect is not good, so we tried to use the off the shelf tracker. MedianFlow tracker from OpenCv library is tried. 8x8 patch centered around the marked point is used to initialize the tracker. Following video shows a single point tracking.
Matlab build-in function using Kanade-Lucas-Tomasi (KLT) algorithm is also utilized. Parameters are number of pyramid levels, forward-backward error threshold, size of neighborhood and maximum number of search iterations.
Calibrating the Camera
Use least squares to fit the camera projection matrix to project the 4 dimensional real world coordinates to 3 dimensional image coordinates. Following shows the corresponding formula and reprojection result.
AR Result
Render a Cube for each frame.
Tour into the Picture: Single View Modeling
Introduction
The goal of this project is to create a simple, planar 3D scene from a single photograph.
result
(1) Fit the box volume
Divide the image into five parts: ceiling, right wall, left wall, rear wall and floor. In this process, only three points are needed, including up-left and down-right vertices, and vanishing point.
(2) 2D to 3D conversion
In this part, we need to define 3D geometry corresponding to the above selected planes. The location of vanishing point corresponds to the camera pos. Relative height and width dimensions of box can be determined using top versus side ratio. Focal length is needed to determine the depth. In this part, focal length is arbitrarily assigned. Then do the homography warping. After this, we can move and rotate the camera and look at the scene from different viewpoints.
Bells & Whistles (Foreground Objects)
Based on the original image, we can get Mask to isolate desired foreground images and background with objects removed. More detailedly, when the foreground images are removed, corresponding region is filled using inward interpolation. Define the plane where the foreground object are located. Then follow the process as before. (“alpha” function is used to show the transparency).
Reimplement: A Neural Algorithm of Artistic Style
Introduction
In the third project, we will reimplement a paper about art style transfer with CNN network developed by Leon A. Gatys, Alexander S. Ecker and Matthias Bethge. Neural-Style, or Neural-Transfer. This algorithm can transfer the art style from the style image to content image. Different from the neural network learned before, this method only need three inputs (a content image, a style image, and an output image) and finally the content of output image will be that of content image and its style will looks like that of style image.
Network
As described in the paper, the network is built on the basis of the VGG-Network and we use the feature space provided by the 16 convolutional and 5 pooling layers of the 19 layer VGGNetwork. Fully connected layers is dropped. The architecture of VGG is shown below:
VGG Architecture
Therefore, I use the feature space provided by the 16 convolutional and replace the pool with average pool. I constructed ContentLoss and StyleLoss exactly following the paper (including gram matrix). For convenience, I rebuilt the model and add the loss alyer in the model. Meanwhile, I only choose the layer I need in the calculation of loss. The optimizer is LBFGS and operate gradient descent for 500 epochs. To run the model, I choose content layer, style layer, weight_style = 10000, weight_content = 1, w_l_style as parameters. The w_l_style is Weighting factors of the contribution of each layer I choose is [0.75, 0.5, 0.2, 0.2, 0.2] or [0.2, 0.2, 0.2, 0.2, 0.2]. Different from the paper, I realized that this parameter has little effect on the result. Large ratio of weight_style and weight_content will make the style transfer more obvious. New model architecture is shown below.
My Model Architecture
Result
Firstly, I transfer Neckarfront to different styles in the paper, including (1) Shipwreck of the Minotaur by J.M.W. Turner, 1805. (2) The Starry Night by Vincent van Gogh, 1889. (3) D Der Schrei by Edvard Munch, 1893. (4) E Femme nue assise by Pablo Picasso, 1910.
Content Image in the paper
Initially, I choose the layers suggested in the paper, but it does not work very well. I change the layers to the first five laters as the hints in the Piazza. It has a better performance. So, I apply it to all others.
Bad results with layers in the papers
Results with first five layers
All resutls are listed:
Although I think my model has a good performance, compared with results in the paper, my model resutls express more color or texture transfer. It might be because I need to adjust the paramters and use the random input, instead of the average of style and content images, to improve the style transfer. Different choice of layers might be helpful.
Secondly, I choose one more style to test my model.
Additional Style Image
Additional Content Image
Additional Image with additional style
Additional Image with one style
Additional Image with one style
Finally, I choose more images to test my models. These style transfer mothod easily change the style of poster and is very useful to image design
Unfortunately, there are some failure case. The below is one example. It just change its color and cannot see exact style. I try to adjust the paramters and choose different layers. To be honest, I do not know what leads to this failure and to some extend, I do not know why different layers have different performance.
Fialure case
Conclusion
I really like the study in the 194-26, and the projects are very interesting. Most importantly, by project, I learn many practical skills and implement many works I cannot imagine. Take the image style transfer, I learn a lot methods to build the CNN, which is very popular now and I am excited to learn it. In addition, when I apply this algorithm to other images, I realize the importance of this method and realize the magic of computer vision.
CS194-26 Final Project
Kecheng Chen, kecheng_chen@berkeley.edu
Xidong Wu, xidong_wu@berkeley.edu
Poor Man's Augmented Reality
Introduction
Capture a video and insert a synthetic object into the scene.
Setup and Keypoints with known 3D world coordinates
We build the box with dimension of 7.5 x 4 x 3.5 inch as the setup. The box and its video are shown below. We build the coordinate on it and pick 42 points on it.
Propogating Keypoints to other Images in the Video
Hacky corner detector has been tried, but the effect is not good, so we tried to use the off the shelf tracker. MedianFlow tracker from OpenCv library is tried. 8x8 patch centered around the marked point is used to initialize the tracker. Following video shows a single point tracking.
Matlab build-in function using Kanade-Lucas-Tomasi (KLT) algorithm is also utilized. Parameters are number of pyramid levels, forward-backward error threshold, size of neighborhood and maximum number of search iterations.
Calibrating the Camera
Use least squares to fit the camera projection matrix to project the 4 dimensional real world coordinates to 3 dimensional image coordinates. Following shows the corresponding formula and reprojection result.
AR Result
Render a Cube for each frame.
Tour into the Picture: Single View Modeling
Introduction
The goal of this project is to create a simple, planar 3D scene from a single photograph.
result
(1) Fit the box volume
Divide the image into five parts: ceiling, right wall, left wall, rear wall and floor. In this process, only three points are needed, including up-left and down-right vertices, and vanishing point.
(2) 2D to 3D conversion
In this part, we need to define 3D geometry corresponding to the above selected planes. The location of vanishing point corresponds to the camera pos. Relative height and width dimensions of box can be determined using top versus side ratio. Focal length is needed to determine the depth. In this part, focal length is arbitrarily assigned. Then do the homography warping. After this, we can move and rotate the camera and look at the scene from different viewpoints.
Bells & Whistles (Foreground Objects)
Based on the original image, we can get Mask to isolate desired foreground images and background with objects removed. More detailedly, when the foreground images are removed, corresponding region is filled using inward interpolation. Define the plane where the foreground object are located. Then follow the process as before. (“alpha” function is used to show the transparency).
Reimplement: A Neural Algorithm of Artistic Style
Introduction
In the third project, we will reimplement a paper about art style transfer with CNN network developed by Leon A. Gatys, Alexander S. Ecker and Matthias Bethge. Neural-Style, or Neural-Transfer. This algorithm can transfer the art style from the style image to content image. Different from the neural network learned before, this method only need three inputs (a content image, a style image, and an output image) and finally the content of output image will be that of content image and its style will looks like that of style image.
Network
As described in the paper, the network is built on the basis of the VGG-Network and we use the feature space provided by the 16 convolutional and 5 pooling layers of the 19 layer VGGNetwork. Fully connected layers is dropped. The architecture of VGG is shown below:
VGG Architecture
Therefore, I use the feature space provided by the 16 convolutional and replace the pool with average pool. I constructed ContentLoss and StyleLoss exactly following the paper (including gram matrix). For convenience, I rebuilt the model and add the loss alyer in the model. Meanwhile, I only choose the layer I need in the calculation of loss. The optimizer is LBFGS and operate gradient descent for 500 epochs. To run the model, I choose content layer, style layer, weight_style = 10000, weight_content = 1, w_l_style as parameters. The w_l_style is Weighting factors of the contribution of each layer I choose is [0.75, 0.5, 0.2, 0.2, 0.2] or [0.2, 0.2, 0.2, 0.2, 0.2]. Different from the paper, I realized that this parameter has little effect on the result. Large ratio of weight_style and weight_content will make the style transfer more obvious. New model architecture is shown below.
My Model Architecture
Result
Firstly, I transfer Neckarfront to different styles in the paper, including (1) Shipwreck of the Minotaur by J.M.W. Turner, 1805. (2) The Starry Night by Vincent van Gogh, 1889. (3) D Der Schrei by Edvard Munch, 1893. (4) E Femme nue assise by Pablo Picasso, 1910.
Initially, I choose the layers suggested in the paper, but it does not work very well. I change the layers to the first five laters as the hints in the Piazza. It has a better performance. So, I apply it to all others.
All resutls are listed:
Although I think my model has a good performance, compared with results in the paper, my model resutls express more color or texture transfer. It might be because I need to adjust the paramters and use the random input, instead of the average of style and content images, to improve the style transfer. Different choice of layers might be helpful.
Secondly, I choose one more style to test my model.
Finally, I choose more images to test my models. These style transfer mothod easily change the style of poster and is very useful to image design
Unfortunately, there are some failure case. The below is one example. It just change its color and cannot see exact style. I try to adjust the paramters and choose different layers. To be honest, I do not know what leads to this failure and to some extend, I do not know why different layers have different performance.
Conclusion
I really like the study in the 194-26, and the projects are very interesting. Most importantly, by project, I learn many practical skills and implement many works I cannot imagine. Take the image style transfer, I learn a lot methods to build the CNN, which is very popular now and I am excited to learn it. In addition, when I apply this algorithm to other images, I realize the importance of this method and realize the magic of computer vision.