CS194-26 Project 1: Images of the Russian Empire

Buyi Zhang, cs194-26-aei

Overview

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, you will need to extract the three color channel images, place them on top of each other, and align them so that they form a single RGB color image.

Approach

Single-scale version

To align the RGB channels, I first implemented a single-scale version, which finds correct windows for G/R channels in the way such that G/R channels are aligned with B channel within a displacement range of [-15, 15]. B/G/R channels are just sliced from the original image in equal heights.

To measure the "goodness of alignment", we used SSD/NCC metrics:

SSD metrics: Compute the sum of squared distance or differences between two images using the raw pixels.
NCC metrics: Compute the dot product of two flattened and normalized images. I have subtract this from one to make the optimal displacements have smallest score.

I have used the NCC metrics as the default one. One last word about computing the metrics is that the borders of channels are causing lots of troubles. As the images provided had uneven borders, I did not compute the metric over the edges of each channel (defined as the outer 10% of the image dimension).

Pyramid version

Exhaustive search will become prohibitively expensive if the pixel displacement is too large (which will be the case for high-resolution glass plate scans).

In this case, I have implemented a faster search procedure such as an image pyramid. An image pyramid represents the image at multiple scales (usually scaled by a factor of 2) and the processing is done sequentially starting from the coarsest scale (smallest image) and going down the pyramid, updating my estimate as I go, until the image total size is less than 2*10^5. It is very easy to implement by adding recursive calls to my original single-scale implementation with user-specified window of displacements.

Assuming the lower dimension image alignments are successful, the search range for the doubled-dimension image is set as [ -3, 3 ], which greatly reduces the computation burden.

Bells & Whistles(Extras)

Better features

Instead of aligning based on RGB similarity, try using gradients or edges. I used the edge detection algorithm provided in the python skimage package (Canny detector). Since it produces an image whose pixels are either 0/1, I can treat the edge image as another input for computing metrics.

Results

The result of my algorithm on three examples of my own choosing, downloaded from the Prokudin-Gorskii collection are also shown at last.

The Edge method performs reasonably well on all images. I think this is because the edge detector uses gradient information and the post-processing, smoothing, and filtering of edges also makes it perform well on images that have large changing gradients.

cathedral Offsets: Pixels_NCC G[5, 2] R[12, 3] Edge_NCC G [5, 2] R [12, 3]