Tour-in-Picture

Introduction

This project basically produces a 3D box scene (missing one face) from a single 2D image. We follow the description from Tour into the Picture by Horry et al., except we do not do the alpha masking of foreground objects and for images with only one vanishing point.

Implementation

We use user input to determine the back plane of an image and the vanishing point. Then, we segment the image accordingly into the ceiling, back, floor, left, and right walls.

We calculate the position of each corner in world coordinates $(X,Y,Z)$using the assumption that our camera / viewer is at position $(0, 0, 0)$ and is looking down the $Z$-axis, while the $Y$-axis is up.

We also assume that the floor is on the $Y=-1$ plane. Our camera operates under the pinhole camera model and has a focal length $f$ which is sort of arbitrary, changed to make our images look best.

To go from pixel coordinate $(u, v)$ to world coordinate $(X, Y, Z)$, we conform to the pinhole camera projection model. First, we convert $(u, v)$ from pixel coordinates to image plane coordinates $(u', v')$. To do this, we translate the coordinates so that the vanishing point is at the origin, then we flip the sign of $V$ because $(1, 1)$ is in the upper right of the $XY$-plane, but the bottom right of image array.

The formula for pinhole projection is: $$u' = f\dfrac{X}{Z}, v' = f\dfrac{Y}{Z}.$$ (source)

If we know one of $X, Y,$ or $Z$, we can easily recover the other two coordinates as we have two equations with two unknowns. In our case, since we assume the floor is on plane $Y=-1$, we can get the 3D coordinates of the floor plane. From the floor plane, we know the depth of the back plane and the $X$-positions of the left and right wall. Then from the height of the back plane or left or right wall, we can recover the coordinates of the ceiling.

Finally, I rectified each of the planes so they could be texture mapped onto 3D planes defined by their 3D coordinates. Using built-in MATLAB GUI functions, I was able to change the camera position and viewing angles to simulate being "in the picture".

Results

A corridor

Tantive IV from Star Wars, A New Hope.
Also the scene from the title video.
This one worked fairly well, as it had a clearly visible back plane (end of hallway) as well as clearly defined floor, ceiling and walls.


Original	Segmentation	Looking down the other end of the hallway. Of course, this is an open top box, so we do not know what the other end looks like. Also, the wall fixtures that were protruding are projected incorrectly.	View of the far upper left corner.	A more aerial view of the far bottom right corner.

A building

The Ideal Cityattributed to Luciano Laurana.

This picture has a very clear vanishing point; at the center of the door. I chose to set the depth of the building to be the backplane.


Original	Segmentation	The right facades.	View of the plaza. The floor is quite pixelated because in the original image, the floor plane takes up only a small horizontal portion.	Looking up at the centerpiece

A path

I took this photo a few years ago at Millenium park in Chicago.

I liked this photo because it's outdoors and doesn't have a clearly defined ceiling / walls, so it can be tricky to determine where the back plane should be. I chose the backplane to be where the hedges on the side ended. This allowed for me not to have to worry about the people in the back as I categorized them as the background. However, it also introduced some artifacts in the model, such as bends where there are not supposed to be.


Original / Segmentation (for whatever reason, I cannot access the original image, but it is shown pretty clearly here.	Buildings on the right.	Far left corner. The kink is very obvious because actually, this path extends past where I marked the back plane, but because there is no wall, I just had to choose somewhere. This results in that shart turn in the path because it is not actually parallel to the imaging plane, but I still segmented it as such.	More aerial view of the corner.

Closing Remarks

I think I learned a lot more about how projection and vanishing points translate from projection to real world, and I think I understand projection rules a lot better now, even though I only examined the pinhole model. I was also pleasantly surprised how such a simple model (basically, a rectangular prism) was enough to produce an ok simulation of a scene.

The most troublesome thing in this project was porting over my Python code to MATLAB, since I did the other projects in Python.