CS194-26: Image Manipulation and Computational Photography

Overview

Prokudin-Gorskii is a pioneer who took color photographs in early 20th century. He recorded three exposures of every scene onto a glass plate. The goal of this project is to automatically reconstruct the colorized image. In order to do this, we need to extract three color channels and compute the best offset between channels under some error metrics. When the original image is of high resolution, we need to speed up the aligning by doing it sequentially starting from a coarse scale to the original high-resolution image.

Experiment

The simplest idea is search over a window of possible offset (e.g. [-15, 15] pixels along x-axis and y-axis for small images), and use the offset that score the highest. Then we stack three channels in sequence depth wise to retrieve the colorized image.

The image did not align well under the scheme above. Since np.roll is implemented in the way that elements rolled beyond a border will reappear on the opposite border, taking the Sum of Squared Differences (SSD) over the entire image will take the differences from these borders into account, which would boost SSD up in an unpredictable way. My solution is that when computing SSD, simply ignore 10% pixels of both channels on all four borders. In this way, SSD can truly reflect how far two channels are.

However, when it comes to large tiff images (3kx3k pixels per channel), the naive approach would take minutes to process. Analytically, the running time grows linear to the size (height x width) and linear to the size of search space. The size of the image is not the only problem. Search space for large images should also be larger because the same amount of deviation on visual effect now represents a deviation of many more pixels. The spec points a way out: pyramid speedup. Instead of searching on the high-resolution image, we can repeatedly rescale the image by 1/2 until its height is smaller than 100px. We return the best offset of this level, and only test offsets of (lower_resolution_offset * 2) + (-1, 0, 1) x (-1, 0, 1). Do this for every level. Essentially we only test a small subset of the original search space. Behind the strategy is the locality assumption that two close offsets lead to similar SSD. With pyramid speedup, now most tiff images can be processed around 8 seconds.

An exception is emir.jpg. (Result of vanilla pyramid on raw pixels is the image above on the left) Brightness values are not even close for different channels. Such fact makes raw pixel not a good feature to compute SSD. Edges are a good feature not affected by brightness. Combining what we learned in lecture, edges can be extracted if we convolve the image with a proper filter (e.g. Sobel filter). Doing pyramid search on edge-emphasizing-image gives us a better alignment as the image above on the right). Doing so does not negatively affect other images as well.

Images of the Russian Empire

CS 194-26: Computational Photography, Fall 2018, Project 1

Zheng Shi, cs194-26-aad

Overview

Experiment

Colorized provided images

cathedral.jpg

monastery

nativity

settlers

emir

harvesters

icon

lady

self_portrait

three_generations

train

turkmen

village

A few other examples from Prokudin-Gorskii collection

drapery

inner decoration

river

Failure & explanation