CS 39J > Schedule & Notes > Session 5 Detailed Notes |
http://inst.eecs.berkeley.edu/~cs39j/session05.html
21 February 2002
My OPTICAL research group is conducting some research that has connections
to photography. We are creating a video piece related to this research
and we will submit it to a very competitive venue. I would like to involve
the CS 39J students in this. This project involves photography to a limited
extent, but part of the nature of a freshman seminar is to familiarize students
with the university's faculty members and their research work, and so this would
be a good vehicle to that end.
In computer graphics, the vast majority of images have all objects in the
scene appearing in focus, regardless of their depth (that is, their distance).
You may have not noticed this, but it is the case. In photography, we
often struggle to maximize depth of field. Ironically, in computer graphics,
most images have infinite depth of field.
We have an interesting project where we are trying to make computer graphics
images look more realistic by adding depth of field, with precise control over
blur. Our research involves adding blur to sharp images.
In computer graphics, it can sometimes be appropriate to work in an ad hoc
manner. Although it is not scientifically accurate, it is appropriate
for movies and video games. However, if images are to be used for medical imaging
or scientific purposes, then we would need to have a more scientific basis for
the techniques being used.
Our approach is interesting in that we are not computing the images fully in
three dimensions. Rather, we are doing something in the spirit of Photoshop.
We start with a computer-generated image that is completely sharp, and
then we blur that image based on the point of focus. We use depth information
of the location of the objects in the scene.
There
is a mammoth annual conference called SIGGRAPH (Special Interest Group on GRAPHics)
which attracts around 30,000 people. About 5,000 of them attend
the technical program which provides a venue for advanced technical talks filled
with integrals and other mathematics! The conference also has the Computer
Animation Festival which comprises the Electronic Theatre with a 90-minute show
which is shown eight times during the conference and several Animation Theaters
which run continuously. Every piece that is accepted to the Electronic
Theater is short in length, but often displays the latest and greatest in computer
graphics. There are approximately 800 pieces submitted, and only 40 or so pieces
are accepted to the Electronic Theatre (a 95% rejection ratio !)
My students and I had this crazy idea to submit a video piece about our research
to this venue. It's a crazy idea because we are competing with the likes of
Disney, Pixar, and Industrial Light & Magic. Those submissions are clips
from pieces created by hundreds of people with multi-million dollar budgets,
whereas we are "just a couple of guys and a PC"!
What we are doing is somewhat different. Most of these pieces show
great special effects, whereas what we are trying to do is be pedagogical, show
our research, and explain what we are doing and how we are doing it. One of
the issues that I am struggling with is that we only have a couple of minutes
-- literally, about two minutes. The problem is that I am not just showing our
results, but I am trying to explain the concept. How does one take a concept
that is technical, and distill it to just a couple of minutes in length?
The reason that I am telling you this is to give you an assignment in which
you might be able to contribute to what we are doing.
Depth of Field
How
many of you have seen Toy Story? All the students raise their hands.
How many of you have seen Tin Toy, an animated short that Pixar
Studios made in 1988? A few people raise their hands.
We have returned to that short and chosen a shot to illustrate our technique.
In photography, we use the word "shot" to mean a single picture.
In animation, it is a little more complicated, because we have a sequence of
pictures. One picture is called a frame. A shot means
a continuous sequence of frames. A scene refers to a set of shots showing
one place at one time.
We obtained a 4-second shot from Tin Toy, with about 120 frames. In this
shot, the Tin Toy is in the foreground, and the baby is in the background.
I wanted a shot that has both close objects and faraway objects, so as to be
able to show our simulation of blur and depth of field.
The contrast between sharp and blurred objects in the image is most apparent
when there are objects that are very close. For example, the rows of students
in the classroom are a couple of feet apart in depth. But if I focus on
last rows, the difference in sharpness between the last two rows is imperceptible
whereas if I focus on the front row, the difference is sharpness between the
first two rows is signficant.
Rack focus is a technique used in cinematography (not in still photography)
where the point of focus is moved during the shot. Pulling focus refers
to measuring distances for purposes of changing focus. Take a look at this
video for a demonstration of rack focus (thanks to Viet Nguyen for the link!).
In the video piece that we are creating, we simulate changing the aperture
on a still frame, and simulate rack focus where we pull focus from the Tin Toy
to the baby. The video piece also shows the original shot, all in sharp
focus, as well as an illustration of how our technique proceeds..
When Pixar gave me the digital frames for the shot in the Tin Toy film,
I asked for the 3D data, but I was told that it had been lost over the years
(Tin Toy was produced in 1988), although they had kept the original storyboard.
I was crestfallen at first since I wanted the 3D data for the depth information
that we needed for our blurring algorithm. (You might be interested to
know that I was able to reliably estimate the focal length of the lens for the
frames just from my experience with photography). However, this
lack of availability of depth information turned out to help illustrate one
of the features of our approach. The algorithm doesn't require the exact
depth information, just as long as one could construct the approximate depth
information. Thus, we constructed "depth maps" for these
frames.
You can watch the full Tin Toy short at Pixar's
website.
We showed the first version of the video which is a 45 second piece showing
the original Tin Toy shot, an illustration of how the algorithm works, a simulation
of changing aperture on a frame from Tin Toy, and finally a simulation of rack
focus on the Tin Toy shot.
We coordinated sound and music with the video. It is important to note
that animation tends to be uninteresting without music or sound. We are
working on a voice-over for the video piece, which will mean subduing the music.
Our current video is about two minutes long, and we had to repeat a core thirty
seconds of the music. The music is from the television show called Captain
Kangaroo, the longest-running children's television show of all time
(from 1955 until 1984)). We chose this music because the original Tin
Toy short also used this music. It was a humorous sight doing the voiceovers
last night at 2 AM in the morning, setting up the microphone in a little closet,
putting the microphone low to prevent "popping P's", sticking the
script in the knobs of the equipment lying around, and trying to manipulate
the computer in the next room, and then running into the closet and then recording
my voice....
Then we showed the voice-over version of the video.
We
would like a photograph that conveys "The University of California, Berkeley"
that we can use for the introductory image. Our current one, shown above, (1)
is too low-resolution, (2) competes with the opening words, and (3) looks slightly
awkward with a floating head. Could we somehow have an image that conveys "Berkeley",
but has ample space to accommodate the title words? The idea of the opening
is to have a few seconds of the image in focus, and then blur the image for
another several seconds. The total time would be about ten seconds.
Assignment:
(1) Image that can stay in place for ten seconds, and that satisfies the ideas
above.
and
(2) Image that has both a close object and far object.
The entire movie needs to be shipped by March 5th, Tuesday. Take photographs,
publish to slide or digital; for slides, coordinate with Steve so we can have
it digitally scanned with the slide scanner.
Students now voice their opinions:
The video compression is too lossy.
The baby's face already looks blurred from the beginning, but looks more detailed
in the end.
The current split screen is a bit hard to follow; maybe simply cutting the image
into two pieces and affecting one portion is easier for people to follow.
Professor Barsky shows some of the books that he has recommended under the
"Culture, History, and Art Theory" part of the "Books" webpage
on the CS39j course website. Later in the semester, students may form groups
of two and present topics in photography to the class.
We talk about the field trip we took at the Berkeley Art Museum and each student
gives a brief synopsis of his/her summary.
Your assignment now is take photographs for inclusion in two places in the
video:
Image for the opening and this needs to be Berkeley-esque and fulfill the requirements
as stated above.
Image for the comparison of the Tin Toy image and the Campanile photograph.
In this segment, the voice over narrates, "Our approach can be applied
to both photographic and synthetic images".
It can be slide or digital format.
There may not be sufficient time to use your photographs for this preliminary
video piece. However, your photographs could be used in a future rendition of
it, should this video piece be accepted.
We've already sent out the private URL for the video to all the students in
the class.
|