Augmented Reality

Input Video

We marked a total of 19 uniform points across three perpendicular planes that can be viewed as the “inside” of a box, and recorded a video of it. The input video can be viewed below.

Keypoint Definition and Propagation

We manually marked the positions of the keypoints for the first frame of the video. We then used the median flow tracking algorithm provided by OpenCV to propagate their positions over the next 600+ frames. The tracking worked well as long as the video remained steady without sudden blurs. The result of keypoint tracking is shown below.

Projecting a Cube into the Scene

Finally, we manually defined the 3D coordinates of the 19 keypoints in the world. Given these coordinates and the their 2D positions in each video frame from the previous step, we could set up an overdetermined system of linear equations to solve for the intrinsics and extrinsics matrices of the camera at each time step. These matrices are then used to project the 8 vertices of the cube from world coordinates into the image space to draw a virtual object into the scene. The final result is given below.