|
|
|
Programming Project |
Find a flat surface and place a box (e.g. shoebox) on the surface. Draw a regular pattern on the box. Decide on atleast 20 points from the pattern which you will mark in the image, and label their corresponding 3D points. Make sure that they are not all planar. Capture a video with the box at the center. The input video can be something as show above on the left.
We will start by marking the points in the first image (using plt.ginput) of the video and getting their 3D world points (you can use skvideo.io.vread package to read and skvideo.io.vwrite to write videos). You need to measure the length of the sides of the box and the distance between the consecutive points in the pattern. Having a regular pattern helps you automate the labeling of the 3D points. Once you get all the measurements, you can get the 3D coordinates of each point by fixing the 3D world coordinate axes to be centered at one of the corners of the box as shown. Note that regardless of which frame you are looking at, the world coordinates of these points will remain the same.
There are several ways of propogating the points from the first image to the subsequent images. The end result of this procedure should be a paired set of 2D and 3D points for every frame in the video:
Once you have the 2D image coordinates of marked points and their corresponding 3D coordinates, use least squares to fit the camera projection matrix to project the 4 dimensional real world coordinates (homogenous coordinates) to 3 dimensional image coordinates (again, homogenous coordinates). Perform this step separately for each frame in the video.
Once you have the camera projection matrix, project the axes points defined at the end of [2] in Resources, and use the draw function to draw the cube on the image (defined just above the axes points in [2]). Note that the draw function takes an unnecessary parameter corners which it doesn't use. Just inputting the projected points for imgpts should suffice. This will place the cube of size 1 unit at (0,0,0). You should translate and scale the coordinates in the axes points to place the cube at a suitable location. Once you render the cube independently for each image, you can combine the images into a video and view the output result.
You need to show the input and the output videos with the cube. Note that one way to improve the result, is to have more keypoints and try to make them as accurate as possible.
You can also try placing an arbitrary mesh of your choice onto the scene using an off the shelf python mesh renderer (pyrender [5]). However, note that you will need to further decompose the camera projection matrix into camera intrinsics, rotation and translation. cv2.calibrateCamera implements this functionality. The decomposition of the camera projection matrix is used by the renderer to figure out self occlusion.
[1] Python cv2 feature matching tutorial
Setup
Keypoints with known 3D world coordinates
Propogating Keypoints to other Images in the Video
1. A Hacky Corner Detector: One way to propogate the points from the one image to the next image is by exploiting the temporal signal in the video. First, we will detect the corners in img[i] and img[i+1] using a harris corner detector: harris.py, harris.m. Lets call them ci and cnext respectively. Since the points will not move by a large amount in consecutive frames, we can find the closest point (in pixel space) from the set cnext for every point in ci. We will further accept only those points for which the pixel space distance is below some threshold. This detector critically depends on small motions in the video, and no spurious corners within the threshold radius. We can start with the marked points from the first image, and then compute the tracked points in the next image. We will continue all the way to the last image while keeping track of the successfully tracked 2D points and their corresponding 3D coordinates (which we know from the first image).
2. Off the Shelf Tracker: You can also use an off the shelf tracker. This tutorial explain the usage of various trackers available in cv2. The one I was able to successfully use was the MedianFlow tracker: cv2.TrackerMedianFlow_create(). You need to initialize a separate tracker for each point. I used an 8x8 patch centered around the marked point to initialize the tracker. Note that the bbox describes the bounding box using 4 values, where the first two coordinates are the start (top left) coordinate of the box, followed by width and height of the bounding box. Update the trackers for each new frame to get the points on the next frame. Keep a track of the points and their corresponding 3D points.
The result of the tracked points should look something like this:
Calibrating the Camera
Projecting a cube in the Scene
Deliverable
Bells & Whistles
Resources
[2] Camera Calibration
[3] Blender
[4] Tracking Tutorial
[5] Python Rendering
This assignment was designed by Ashish Kumar and Alexei Efros.