Student Information

Ian Albuquerque Raymundo da Silva
ian.albuquerque [at] berkeley.edu

International student participating in the Brazilian Scientific Mobility Program (BSMP) at University of California, Berkeley, for the 2015-2016 academic year. Enrolled as a Computer Science Extension student. Born and raised in Rio de Janeiro, Brazil. Undergrad student at Pontifical Catholic University of Rio de Janeiro, studying Computer Engineering and Mathematics. Traditional and Digital art lover and currently interested in Computer Graphics and Artificial Intelligence - but who knows what new field of study I might fall in love with! Trying to learn the most I can during this one year here in Berkeley.

My Proposal:

The idea of this project is to have a video (for instance, a music video) drive one of your videos. The motivation is: “I want to have a video that looks like I am dancing like Beyoncé!”. The program should be able to find good matchings and build transitions that makes sense. The largest part of this project is being able to find a way to match frames (which can be pretty hard because they do not share the same people or elements). For final good looking effects, we can also adjust the artistic effects of the video so the feeling of watching it is the same as watching the reference.

Overview Video:

Preserving Video Continuity by Making Video Textures
Link to the "Video Texture" Paper: http://www.cc.gatech.edu/cpl/projects/videotexture/SIGGRAPH2000/index.htm

Consider the following input video. We want to relate how each frame of the video relate to each other and how good it is to transition from one frame to another.

Input Video:

Transition Costs and Probability Calculations

First of all, we downsample the videos so they are easier to work with. Then, we calculate the SSD between each frame of the input video. This will be the distance from one video to another. The cost of changing between two given frames will be the sum of the distances of a interval [-N,N] of frames centered on the next frame. This calculation takes in consideration that the transition should not only preserve position, but also dynamics.

Parameters so Far: {N}

Once we have the cost of changing from one frame to another, we may define a probability from doing such transition using an exponential function. After normalizing each probability, we may generate a video by just flipping a coin and getting the next frame from the probability table.

Parameters so Far: {N, Sigma}

Video Generated with a High Sigma:

Q-Learning and Local Maximas:

Decreasing the sigma helps smoothing the result. However, we are still having a greedy approach. We may take the best transition now, but we do not know if this transition will lead us to worse transitions in the future. To propagate those bad transitions, we use the Q-Learning algorithm. We iterate the following formulas to update our cost tables until it starts converging its values.

Parameters so Far: {N, Sigma, p, Alpha}

In order to reduce the random noisy transitions, we concentrate a set of transitions to nearby frames to the strongest transitions. Basically, given a range, we find the local maximas of that range.

Parameters so Far: {N, Sigma, p, Alpha, Local Maxima Range}

Q-Learned Costs After Convergence:

Results with Q-Learning and Local Maximas:

By Tweaking the Parameters a Little Bit: (specially a smaller sigma and smaller p)

With this, we have a probability table for our target video that we can use to measure how good one transition is. We should take that into account so the video looks continous.

The Video Matching Itself:

Straightforwardiest Approach Possible:

The most straightforward approach we could have for our goals is to simply calculate the SSD between each frame from the reference video and the target video and get the lowest values for our matching. As one would expect, this approach is not good enough. The images are too different for this simplistic matching. The result for this can be found below.

Lets Consider a Simple Input: (Animations)

This is a good example because although the dance is the same, its timing is different. Also, we do not have a background, which helps. You can note that the images are not centralized as well.

Reference Video:

Target Video:

Interest Area Detection by Background Subtraction and Thresholding:

One idea is that out interest area for matching is the area that has movement. Lets try to detect those areas by subtracting each frame of the video from the video average (our background). After that, we threshold the image so whe have a well defined interest region in white, versus a background, in black.

Average Image for the Reference Video:

Motion Detection of The Target Video:

Will Only Work for Simple Examples: (Interest Area Detection is Problematic for Real People Videos)

Motion Detection with Centralization:

We want to be using those motion areas for calculating the SSD between two frames. However, that is not of much use if it is not centralized. For that, we take the coordinates of each white pixel and average them. Once we have the average coordinate, we translate the image by that ammount so its centered.

Area Matching:

One idea is to match the frames by calculating the difference of the total area of the interest areas. That did not work, as expected.

Harris Detectors:

Another idea was to use the harris detectors to find correspondences between the frames. There were a lot of mismatches and the results were pretty bad.

Interest Area SSD Video Matching Result:

Here, I have calculated the SSD of the interest areas centralized. This approach worked pretty well. The only problem here is the fact that the video does not look continuous, because the SSD does not add any restriction on the fact that the frames must be connected in any way. This is a simple frame by frame correspondence.

Interest Area SSD Video Matching Result Considering Transition Probabilities (Continuity):

Here is the part where all our effort on making the video texture comes into place. By using the probabilities as weights for the distances of the frames, we may ensure that frames that do not have anything to do with are less attractive for the matching. By doing this, we achieve a video that is quite continuous and matches the reference video. Specially in the end of the following video you can see the difference that the frame matching makes.

What I Learned:

I learned that matching images is quite tricky. Besides that, I loved doing the video textures and I think there are still some improvements that could be done so I can achieve better results. Maybe I could use those to put as profile videos on Facebook!

This is just here because everybody was doing Dolly Zooms:
(so I had to show mine)

"Me, when I think of pizza" - Bandeira, Roberto

Last, But Not Least:

Special thanks to Alexei Alyosha Efros, Rachel Albert and Weilun Sun for help during lectures, office hours and questions on Piazza.

Website built using bootstrap.