Project 1: Colorizing the Prokudin-Gorskii Photo

Zuyong Li

Spring 2020

Task

The goal of this project is to reconstruct color images given the digitized Prokudin-Gorskii glass plate images. The filter order of the glass plate images is BGR from top to bottom. The galss plate image is split into three images corresponding to its color channel, and those channels are aligned and stacked together to produce colored images. To obtain images with as few visual artifacts as possible, alignment and cropping are essential.

Alignment

Single-scale alignment on low resolution images

For low-res images, the easiest way to align color channels is to exhaustively search a window of possible displacement and use some metrics to find the best match. Two metrics are used, sum of squared distances ($SSD$) and normalized cross correlation ($NCC$). Suppose that image $B$ can be aligned with image $A$ after some transfromation, and the transformation vector $\vec{v}$ is inside some given window. To find the best $\vec{v}$, first transform $B$ by a possible $\vec{v}$ to obtain a new image $B'$, then use the metric to evaluate the matching between $B'$ and $A$.

$SSD = \sum (B' - A)^2$

$NCC = \frac{\sum (B' - \bar{B'})(A - \bar{A})}{\sqrt {\sum (B' - \bar{B'})^2 \sum (A - \bar{A})^2}}$

The differences between $SSD$ and $NCC$ is that $SSD$ should be minimized whereas $NCC$ should be maximized, and $NCC$ is more computationally expensive if using the above formula.

Multiscale alignment on high resolution images

When dealing with high-res images, the transformation vector $\vec{v}$ can be very large and exhaustive search will be slow and inefficient. For example, if $\vec{v} = (100, 100)$, and the sign of transformation is unclear, then the window might be $[-120, 120]$ and exhaustive search requires 240 * 240 = 57600 times of metric claculations. The image pyramid is the solution.

The main idea of image pyramid is to continuously downsize the image by a constant factor. As an example, assume a factor of 2. At the coarsest level, using single-scale to find the transformation vector $\vec{v}$. When upsizing from the coarse to the fine level, simply scale $\vec{v}$ by the same factor 2, $\vec{v}' = 2\vec{v}$ and the search window becomes [-2, 2]. This is why image pyramid speeds up the computation.

Cropping

Cropping by ignoring borders

The easiest way to crop the image is to ignore border pixels under the assumption that border pixels contain no useful information. For example, simply throw certain percentage of pixels on the four sides.

Result

The reconstructed color images are shown below. For low-res images, the three plots are: uncropped image using $SSD$ as metric, uncropped image using $NCC$ as metric, and cropped image using $NCC$ as metric. For high-res images, the three plots are: uncropped image using $SSD$ as metric, the cropped image using $SSD$ as metric, and the cropped image using $NCC$ as metric. The transformation vector of all the plots are also shown. As we can see, $NCC$ generally performs better than $SSD$ though it is relatively slower, for example, the uncropped monastery.jpg and the cropped emir.tif. Generally, cropping the image, even simpling ingoring the border pixels, performs better than doing nothing before alignment. There was an exception, the cropped images of melons.tif were worse than the uncropped.

Bells & Whistles (Extra Credit)

Contrast

Adding contrast by rescaling the image to cover the full image intensity range, which is to rescale the darkest pixel to 0 and the brighest pixel to 1. Comparing the image emir before and after contrast, this simple method indeed improved the quality of output.

Cropping

Throwing a predetermined range of pixels did well, but some useful information was also ignored. Furthermore, if the image was cropped first, then new border effect would be recreated after alignment, as it was clear in the previous result section. One possible way to improve cropping is to use edge detection before alignment. Edge detection is simply convolve the image with a kernel matrix, which emphasizes the change of intensity. After alignment, crop the border.

There are two davantages of preprocessing the image with edge detection. The first is that the transformation vector found is more accurate, as in cathedral.jpg and melons.tif. The second is that border effect is solved and color channels are aligned more appropriately, also shown in cathedral.jpg and melons.tif.

The last two examples, melons.tif and self_portrait.tif, also showed that using edge detection before alignment, cropping after alignment, and contrast improved the image quality.