CS194 Final Project | Atte

CS 194: Final Project

Atte Ahmavaara

Project A: Augmented Reality

Project B: Seam Carving for Content-Aware Resizing

Project A: Augmented Reality

Raw Video Input

The algorithm can receive any style of video as import, but I opted to use 1920x1080 60fps video to balance runtimes and quality. The more frames that are recorded, the smaller the amount of motion between successive frames, and thus the trackers can be more accurate. I used 2 different videos to showcase the algorithm in various conditions.

Corner Table:

Kitchen:

Initialize Point Correspondences

Next, we give the algorithm access to the real dimensions of the calibrating box, so that we can accurately define measurements such as '10 cm away' when drawing the scene. In the grand scheme of things, this isn't super necessary, but we do need some sort of 3D coordinates in the scene, so might as well make those be accurate in 'real life' measurements. We draw correspondences between these 3D points and the 2D screen points that we will use in the next step to initialize our trackers.

Initialize Point Trackers

Now we initialize our trackers in order to track each of the 3D coordinates in 2D screen space. We define a bounding box for each tracker. After much testing, with the amount of movement in the corner_table video, I used a kernel size of 21x21 px, while for the kitchen video I used a kernel size of 31x31. After trying every reasonable point tracking algorithm, I came to the conclusion that the MedianFlow algorithm provided the best results since it is very cognizant of when it fails and it does not drift as much as KCF or Mosse. Here are our initial frame point trackers and bounding boxes:

Corner Table:

init frame trackers corner table

Kitchen:

init frame trackers kitchen

Run Tracking

I then ran the individual screen space trackers for each frame in the video. I implemented error checking that, in the event that a tracker failed, that tracker would get automatically disabled and the occurrence would be logged. Before calculating each frame, the algorithm would check to confirm that all the remaining points were not planar. I made this configurable in the config as well. If the algorithm did not finish and the remaining points were planar, then the algorithm would cut the video short, but still finish with the valid data it had collected.

Corner Table:

Kitchen:

Calculate Camera Homographies

Now with our 2D screen space coordinates for each frame matched to the static 3D points, the algorithm goes frame by frame to calculate a homography matrix from the 3D world coordinates to the 2D screen space coordinates. This process was done using a custom implementation of least squares since NumPy and SciPy least squares function defaults to the trivial solution when solving homogenous systems.

Render Cube onto Frames

Now, armed with the camera 3D -> 2D homography for each frame, we computed the cube's desired 3D points, and then for each frame, applied the frame's homography, shaded in the lines and faces desired by using the 2D coordinates, and rendered the frame using FFmpeg.

Corner Table:

Kitchen:

 

 

 


 

 

 

Project B: Seam Carving for Content-Aware Resizing

Algorithm Overview

Calculate Energy Function

The energy function defines what the most 'important' parts of the image are. Essentially, the minimums of the energy function are the parts of the image that could be least noticeably removed. I decided to calculate this function efficiently by convolving the image with a form of a partial derivative filter. After trying many different filters, I settled on the Sobel filter, since it prioritizes weighing pixels that will be next to each other if said pixel is removed, but still takes into account all adjacent pixels.

Calculate Minimum Seam

To find the minimum seam, I employed bottom-up dynamic programming to efficiently search for the minimum weighted seam, based on the energy function. After finding the index of the top-most pixel of the minimum seam, I wrote a backtracking function that finds the rest of the seam in linear time. I further optimized the DP approach by using NumPy array commands, so that the algorithm runs in ~linear time instead of quadratic time.

Precompute (Bells and Whistles)

To speed up the calculation of the actual shrinking of the image, I precompute the optimal ordering of seams to remove. I also precompute all seam pixels. I opted to not precompute every possible resized image to save on memory, but at least in python terms, storing the individual pixels made the shrinking phase very fast--one could say realtime.

Shrink Image

I iteratively loop through each crop level, removing the precomputed seam each iteration until the desired crop amount is reached.

Key Trick

To very simply implement the ability to crop in either axis, if a vertical resizing is desired, the image is transposed before any calculations, and then transposed yet again at the very end.

Demo Images

Norway:

width=400px reduction, height=100px reduction

norway orig

norway w norway h

 

NYC:

width=500px reduction, height=350px reduction

nyc orig

nyc w nyc h

 

Ocean:

width=1000px reduction, height=400px reduction

ocean orig

ocean w ocean h

 

Snow:

width=200px reduction, height=200px reduction

snow orig

snow w snow h

 

Selfie:

width=300px reduction

selfie orig

 

Tree:

width=400px reduction

tree orig

tree w

 

FAILURE CASES:

Selfie:

height=200px reduction

selfie orig

selfie h

 

Tree:

height=300px reduction

tree orig

tree h