Final Projects

Proj1: Augmented Reality

Keypoints with known 3D world coordinates

I started by marking the points in the first image (using plt.ginput) of the video and getting their 3D world points. I measured the length of the sides of the box and the distance between the consecutive points in the pattern. After I got all the measurements, I could get the 3D coordinates of each point by fixing the 3D world coordinate axes to be centered at one of the corners of the box as shown.

Propogating Keypoints to other Images in the Video

The end result of this procedure should be a paired set of 2D and 3D points for every frame in the video. I used two ways to propogate keypoint in the video (According to the change of Bells & Whistles): 1. A Hacky Corner Detector and 2. Off the Shelf Tracker.

A Hacky Corner Detector

First, I detected the corners in img[i] and img[i+1] using a harris corner detector. Since the points doesn't move by a large amount in consecutive frames, I can find the closest point from the set points of img[i+1] for every point in the set points of img[i]. I started with the marked points from the first image, and then compute the tracked points in the next image. I continued all the way to the last image while keeping track of the successfully tracked 2D points and their corresponding 3D coordinates (which we know from the first image).

Off the Shelf Tracker

I used the MedianFlow tracker: cv2.TrackerMedianFlow_create(). I used an 8x8 patch centered around the marked point to initialize the tracker. I updated the trackers for each new frame to get the points on the next frame. The point position I used is the center of the new bounding boxes. I kept a track of the points and their corresponding 3D points.

Calibrating the Camera

Having the 2D image coordinates of marked points and their corresponding 3D coordinates, I used least squares to fit the camera projection matrix to project the 4 dimensional real world coordinates to 3 dimensional image coordinates.

Projecting a cube in the Scene

I projected the axes points of a cube using the camera projection matrix, and drawed the cube on the images.

Bells & Whistles

According to the change of Bells & Whistles, I used two ways to propogating keypoints to other images in the video as my Bells & Whistles.

Proj2: Reimplement: A Neural Algorithm of Artistic Style

Overview

This project is reimplementing the work of A Neural Algorithm of Artistic Style. The work uses representations from VGG-19 to separate and recombine content and style of arbitrary images. And this work introduces two losses, style loss and content loss, to constrain style-transfered images.

Vgg-19 Network

VGG-19 is a convolutional neural network that is 19 layers deep. I used pretrained VGG-19 network from torchvision.models.

Losses

The loss is weighted summation of the style loss and content loss. Using VGG-19, the style and content features are extracted. The style loss and content loss are used to define the similarity between new image and content/style images. In the following equations, F_l represents the output of the network at layer l. The α I used was 1×10−3, the β was 1.

Content Loss

Style Loss

Results