Link Search Menu Expand Document

CS 194-26 Project 1 - Haoyuan Liu


Overview

In this project, we implemented an program to algorithmically align the three different color channels of the images in the Prokudin-Gorskii photo collection and produce colored images.

See Generated Images


Implementation

To begin with, we implemented a naive function that takes in two color channels and tries to translate the first in a (-12, 12) region for both X and Y axis exhaustively in attempt to find the best alignment with the second.

To numerically evaluate if the two channels are well matched, we cropped out the center part of the images (about 1/4 in area) and calculated the normalized cross-correlation of the two 2D arrays. The best offset is selected and returned based on this metric alone.

The naive method worked well on the small JPEG inputs but didn’t work for the high resolution TIF images. Instead of increasing the N factor of the O(N^2) runtime in the naive search by enlarging the search region, we used downsampling to produce a image pyramid for speedups.

For the top of the pyramid, we chose a downsampling rate of 5x per edge to produce an image of 1/25 resolution. We then applied the naive search function on the downsampled image to get the offsets. We applied the offsets times 5 to the original image to realign them and repeated the process for downsampling rates of 4x through 1x.

This method worked well for most large images. We chose the red channel (2nd) as base channel since it produced the best results.

For the emir.tif input, using the center part of the image for evaluation resulted in mismatches due to the pure colors of the clothes and thus differences in brightness levels. For this input alone, we changed our function to crop out the center of images to crop out the right border part instead since the wooden door is more consistent in brightness across channels.


Extra credit

Automatic cropping

We added extra logic for automatic cropping to the program. The function works by creating a mask on the brightness levels of all color channels. If one of the channels is very dark or very bright(<= 0.05 or >= 0.95), we set the mask to be true. We then take the columns and rows starting from the borders and crop them if more than 50% of the masks are turned on for the column or row. We repeat this until the first column or row where less than 50% of the pixels are masked.

We applied cropping twice in our program, first on the raw input of three images stacked together and then after then final alignment through pyramid search. Note that this also improves the alignment of some pictures by cropping before the search, as seen in examples below.

Without Cropping With Cropping