International student participating in the Brazilian Scientific Mobility Program (BSMP) at University of California, Berkeley, for the 2015-2016 academic year. Enrolled as a Computer Science Extension student. Born and raised in Rio de Janeiro, Brazil. Undergrad student at Pontifical Catholic University of Rio de Janeiro, studying Computer Engineering and Mathematics. Traditional and Digital art lover and currently interested in Computer Graphics and Artificial Intelligence - but who knows what new field of study I might fall in love with! Trying to learn the most I can during this one year here in Berkeley.
As this paper by Ng et al. (Ren Ng is the founder of the Lytro camera and a Professor at Berkeley!) demonstrated, capturing multiple images over a plane orthogonal to the optical axis enables achieving complex effects (see this gallery - hover around different part of images) using very simple operations like shifting and averaging. The goal of this project is to reproduce some of these effects using real lightfield data.
For the input of the project we have a small lightfield. What is a lightfield? A lightfield is a function that contains the information for every ray of light in a given surface (in our case, a plane). It turns out, that capturing all this information about the photons in a surface is the same as taking pictures along that surface. As a result, we can try to discretize our lightfield in a light grid, that is basically some series of pictures equally spaced. This is our input, as seen below.
It turns out that objects that are close to the camera move much more than the objects that are far away because of the displacement of the pictures. Because of that, if we try to average all the images, we will get an image that is sharp far away and blurry nearby. This creates an image that looks focused far away.
One could imagine: "Why don't we align the objects that are nearby? This will create an image that is sharper on objects that are close." Well, thats exactly what we should do! By trying to align objects that are nearby, the objects far away will be translated from each other, creating a blurry effect on them - just the opposite from the global average without any translation! For the alignment, lets use the assumption that the grid is evenly spaced. If we compute the displacement from the first image to the last image on the grid, we should be able to get the displacement dx and dy for each iamge by just making a division operation. (Note: choosing the last image and the first image as references proved to be important - because of its large distance, it allows possbile wrong alignment errors and small variations on the grid spacing to be washed away by the number of images between them)
To get this displacement, we need to know what we are aligning. For that, we ask the user for a point, where we will use to create a patch around it. This patch will be the feature that we will try to align. In the case below, the patches around the point chosen has the knight. As you can see, the images are shifted. Our goal is to compute this shift. A good way to align is to use the gaussian pyramids of the references images and align level by level trying to minimize the sum of squared differences metric, just like we did for project 1.
Once we get that alignment, we just translate each image appropriately and average them together. The result without the alignment of the objects nearby is an image focused further away and the aligned image is an image focused on the selected point. The difference can be seen below.
The more images we get for the averaging, the more blurry we get the non-aligned objects to be. Thinking of that, we can change how many images we use for computing the averaging. This creates the illusion of changing the aperture size of our camera. We can think that we have a center image (possibly, the center image of the grid) and that we only want to calculate the average of the images inside a given radius. By increasins the radius of our "aperture", we allow more data to get in our averaging, getting blurry results - just like a normal camera. By decreasing the radius, we start getting data similar to an infinitesimal size pinhole, which means no blur. Below, we can see how we get the images for the averaging and the difference of a large and a small radius value. (Note: For this project, I have used the Manhattan distance for the distance between images, which is a good approximation).
For this project, I have created a Python application that allows the user to select the two parameters of the postprocessing: the focus point and the radius size. Once this application loads the input images, the user can left-click on any point on the image to focus our postprocessed "camera" or can scroll the mouse wheel to increse/decrease the radius value. With this, I could easily play around with many of the lightfields available from the The (New) Stanford Light Field Archive. It works most of the time, in a pretty responsive way. Some takes from me trying my program are avaiable below.
One huge problem for the interactive application was its processing speed. Whenever the user clicked somewhere on the image, the program had to (1) calculate the alignment between the references and (2) get all the images in the radius, translate and average them. Both processes are slow. To speed up the alignment, I have used the multi-scale alignment of project one on patches around the selected points. This proved to be fast enough. To work around the problem of the translations and averagings I have decided to give the user lower resolution images of the result while I calculated the higher resolution ones. This basically means that I am aligning and averaging the low resolution layers of the gaussian pyramid first so I can deliver results as fast as possible. Once I have worked on deeper levels, I can work on the details and get away with taking more time. Because I am using gaussian pyramids everywhere, I have decided to store all the pyramids of the input lightfield in memory. This does not seem to be the right choice because I got RAM usage of more than 9GB on some of my examples - improvement is necessary.
I could not resist trying to create my own lightfields. The idea was simple: put my camera on my tripod, move it regularly on a grid and get some pictures. It turns out that my tripod only moves in one direction and that it is pretty hard to get the equal spacing assumption to work. Because of that, the results did not look good. For the examples below, I did not capture a light grid. I have captured more of a light line. Instead of a grid NxM, I captured a Nx1 grid. This resulted in weird blur effects.
A line with 5 pictures was taken for this. This experiment did not work well because of the irregular spacing between the images. Also, the small number of images increase the error of the displacements and decrease the overall averaging quality. (Well, it turns out that averagings are good when you have lots of data - who would guess that?)
Instead of trying to move a physical camera in a grid, I have tried to move a virtual game camera. I have chosen the game "Portal 2" to take several screenshots of the character in different positions accross a line. The result was a little bit better than the "Coding Night", probably because of the better-ish alignment and the (small) increase in the amount of data. I was not capable of getting more data without messing up with the equal spacing between the pictures.
I have learned that taking equally spaced pictures is VERY hard. I have also learned more of the power of averaging - how a simple technique can create such cool effects. The lightfields uses for this assignment were also pretty interesing. I think it could have some pretty neat effects on the future of photograhy and even on creating imersive experiences. I would love to work on a lightfield that is an actual gaussian surface - I guess one could do some pretty cool object visualization and manipulation with that.
Website built using bootstrap.