CS 194-26 Spring 2020
Project 1: Colorizing the Prokudin-Gorskii photo collection

Yibin Li


Overview

Given multiple images with their three channel image, reconstruct the original colorized picture.

Approach

In order to see the difference, I stack raw input together without any aligning. The result is bad, but it provides me intuitions of how to align the picture together. I realize that I could move each single channel picture to some fixed point so that pixels in the image are roughly aligned. Then, stacking the images together after the alignment and fine-tuning the parameters, I could see the reconstruction of original color image.

Emir_no_align

stack the raw image without any aligning


An intuitive way of finding the displacement is searching all possible dispalcement and find one that minimizes picture pixel-wise difference. Thise could be done by two loops searching x and y displacement. To measure the difference between two images, a image matching metric is used. There are many kind of image matching metrics, and I primarily use two: Sum of Square Distance (SSD) and Normalized Cross-Correlation (NCC).

For a small image, it is fine to do a exhaustive search. However, for a large resolution picture, it is rather time-consuming. A better way to do the search is using a technique called image pyramid: each image is stacked into a pyramid of mutiple rescaled image. The first level (bottom) of the image usually is the original image without any rescale. The top is the image with scaling that significaly reducing the size of the image. And the searching starts from the top to bottom: if a ideal displacement (x, y) is determined in the top level, and the image rescale factor is 0.5, then the next level will start from (x/0.5, y/0.5) thus reduing the searching area.


Results

Low Resolution images

Cathedral

Cathedral

Green: (2, 5) Red: (3, 12)


Monastery

Monastery

Green: (2, -3) Red: (2, 3)


Tobolsk

Tobolsk

Green: (3, 3) Red: (3, 6)


Large Resolution Images

Emir

Emir

Green: (24, 49) Red: (-254, 95)


Harvesters

Harvesters

Green: (16, 59) Red: (13, 124)


Lady

Lady

Green: (17, 41) Red: (23, 89)


Lady

Lady

Green: (9, 48) Red: (11, 112)


Melons

Melons

Green: (10, 82) Red: (13, 179)


Onion_churnch

Onion_churnch

Green: (26, 51) Red: (36, 108)


Self-portrait

Self-portrait

Green: (29, 79) Red: (37, 176)


Three generations

Three generations

Green: (14, 53) Red: (11, 112)


Train

Train

Green: (5, 42) Red: (32, 87)


Village

Village

Green: (12, 65) Red: (22, 137)


Workshop

Workshop

Green: (0, 53) Red: (-12, 105)


It should be noticed that emir.tif is not well aligned when only applied image pyramid and NCC. I found that in this images, the red channel image has a different brightness than the green and blue channel image. Thus the red channel image align wrongly in the final output.

One possible solution is scaling the channel with the most brightness difference.

Extra Images

Extra_1

Workshop

Green: (5, 24) Red: (8, 109)


Extra_2

Workshop

Green: (-11, 33) Red: (-27, 140)


Extra_3

Workshop

Green: (-18, 25) Red: (-37, 115)


Images with Brightness Issue

Scaling the channel with the most brightness difference

In the emir.tif, the blue and green channel images are darker by 1.12 times than the red channel image. Therefore, it is natural to reduce the red channel image and keep the brightness average.

No brightness Averaging

village

Green: (24, 49) Red: (-254, 95)

Red Channel Brightness Averaging

village

Green: (24, 49) Red: (-254, 95)

This modification has no change to the image alignment. We need a better and clever idea.


Bells and Whistles

Edge Detection

Edge detection is a powerful tool to extract inforamtion from an image. It uses small fiters such as 3 by 3 sobel filter to "sharpen" edges from picture. It is invariant to the brightness issue and only keeps relevant details in the image.

Emir: Before

Emir

Green: (24, 49) Red: (-254, 95)

Emir: After

Emir

Green: (23, 49) Red: (40, 107)