Proj 1 Images of the Russian Empire

Cheng (Bob) Cao

Overview

This project is a program that takes three monochrome films captured using colored filters and align & combine them into a resulting image.

Algorithm

The Basic Algorithm

def similaity_index(x, y) ... end def align (imageA, imageB) max_similarity = imageA dx = 0 dy = 0 for x in [-range .. range] for y in [-range .. range] similarity = similarity_index(offset(imageA, x, y), imageB) if similarity > max_similarity max_similarity = similarity dx = x dy = y end end end return dx, dy end

The algorithm searches in a specified region of [ r a n g e , r a n g e ] [-range, range] , offset the image A and compares it to image B. If the similarity is higher than the maximum we have recorded, we record the current offset and similarity as the maximum. After the search we return the offset.

The similarity function here can be any function that computes similarity between two images, ideally image B will be the target and the function calculates spatial similarity in order to align the image properlly. Here we use Sum Squared Distance (SSD) as the metric, and we align the Red and Blue image to the Green image as a target.

The Green image is chosen as a target because it is the brightest perceived color in natural images and human vision. Green channel is often regarded as a brightness channel, which should give us a better target with lower noise.

Image Pyramid

Because the images could get very large, and for a 10 % 10\% offset, the image offset could be 300 300 pixels in either direction. Searching over this radius could be extremely slow and painful. The solution here is image pyramid.

Image pyramid progressively downsample the images by a factor a 2 2 and forms a "pyramid" of images. In graphics term this could be called MipMaps. We can perform search on the smallest image and progressively perform adjustments to the estimate when the image gets more granular. By using image pyramid, we can use a constant offset and even get much better performance on both large and small images.

In this implementation, a range of [ 1 , 1 ] [-1, 1] is used. As the image pyramid halfs the size of the images each time, this range would be sufficient as each level moves the image in same amount of offset as the previous image's pixel size. This is similar to how binar number representation works, we only need a very small adjustment per level to achieve a large and smooth range. As the search range per level is very small, this implementation is very fast and robust. Each image is processed under 10 10 seconds.

Results

Emir: R ( + 17 , + 58 ) R(+17, +58) , B ( 23 , 53 ) B(-23, -53)

emir.tif

Harvesters: R ( 3 , + 69 ) R(-3, +69) , B ( 15 , 61 ) B(-15, -61)

harvesters.tif

Icon: R ( + 5 , + 53 ) R(+5, +53) , B ( 17 , 45 ) B(-17, -45)

icon.tif

Lady: R ( + 1 , + 69 ) R(+1, +69) , B ( 5 , 53 ) B(-5, -53)

lady.tif

Melons: R ( + 3 , + 101 ) R(+3, +101) , B ( 9 , 85 ) B(-9, -85)

melons.tif

Onion Church: R ( + 10 , + 58 ) R(+10, +58) , B ( 26 , 53 ) B(-26, -53)

onion_church.tif

Self Portrait: R ( + 7 , + 101 ) R(+7, +101) , B ( 27 , 85 ) B(-27, -85)

self_portrait.tif

Three Generations: R ( 3 , 61 ) R(-3, 61) , B ( 13 , 53 ) B(-13, -53)

three_generations.tif

Train: R ( 26 , 45 ) R(26, 45) , B ( 6 , 45 ) B(-6, -45)

train.tif

Village: R ( 10 , 73 ) R(10, 73) , B ( 41 , 0 ) B(41, 0)

village.tif

Workshop: R ( 11 , 53 ) R(-11, 53) , B ( 0 , 53 ) B(0, -53)

workshop.tif

Cathedral: R ( 1 , 7 ) R(1, 7) , B ( 2 , 3 ) B(-2, -3)

cathedral.jpg

Monastery: R ( 1 , 6 ) R(1, 6) , B ( 2 , 3 ) B(-2, 3)

monastery.jpg

Tobolsk: R ( 1 , 4 ) R(1, 4) , B ( 3 , 3 ) B(-3, -3)

tobolsk.jpg

Expedition to the Urals: R ( 27 , 69 ) R(-27, 69) , B ( 5 , 41 ) B(5, -41)

00747a.tif

Austro-Hungarian Prisoners of War: R ( 10 , 89 ) R(10, 89) , ( 9 , 37 ) (-9, -37)

00279a.tif

Improvements

Similarity metrics

We can observe from the previous results that SSD is not very reliable due to the luminance difference between each channel we are trying to align. Therefore we should introduce a more robust similarity index that tolerates luminance and contrast differences.

The first choice is to instead of using raw pixels, the gradient or edges of the image will be used.

Sobol operator is tested, but this doesn't yeild successful results, as some images may contain much stronger gradients in one color (natural scenes will contain mostly green gradients but none of the red gradients). Therefore, another metric needed to be introduced.

The index of choice here is SSIM (Structural Similarity Index). (https://en.wikipedia.org/wiki/Structural_similarity)

After switching to SSIM, we are able to align all images to a much better precision.

SSD:

SSD

SSIM: SSIM

Color Correction

Color Mapping

As the original images are captured through colored filter with monochrome glass negatives, the color channel will not map correctly to R, G, B. Color remapping is needed.

The method of color remapping here used is a 3 × 3 3 \times 3 matrix called Color Correction Matrix. We can multiply this matrix with each color in the aligned output, and therefore remap the color.

The color correction matrix used in this solution is:

[ 1.0 0.3 0.05 0.1 0.8 0.0 0.2 0.0 1.0 ] \begin{bmatrix} 1.0 & 0.3 & 0.05\\ -0.1 & 0.8 & 0.0\\ 0.2 & 0.0 & 1.0 \end{bmatrix}

Auto White-Balance

The auto white balance algorithm used here is "white patch retinax". This algorithm assumes that there's always a bright patch (spot) in a image, and this spot is perceived as "white". We need to find the brightest spot in the image, and correct that spot to white. This correction is then applied to the whole image.

Auto Dynamic Range

After all these processing, and with the nature of films, the brightest spot and darkest spot are not necessarily white or black. In order to utilize the full range of digital images, the image is remapped to zero to one. The process is achieved by subtracting the whole image by the darkest channel value, and then divided by the brightest channel value. For example, if the darkest color in the image is ( 0.1 , 0.2 , 0.05 ) (0.1, 0.2, 0.05) and the brightest is ( 0.3 , 0.5 , 0.7 ) (0.3, 0.5, 0.7) , the whole image is subtracted by ( 0.05 , 0.05 , 0.05 ) (0.05, 0.05, 0.05) and then divided by ( 0.7 , 0.7 , 0.7 ) (0.7, 0.7, 0.7) .

Auto-cropping

If we start from observing the original data:

cathedral.jpg

We can see that there's a lot of white borders around the image. So the first step is to crop-out the white borders of the image. The aligning process may also shift the image too much to the edge, so before we align the images, we will add a white margin to the R, G, and B images, as the overflowing areas will be cropped later, and it protects the image itself to been cropped.

The detection of white edge is by summing each channel on both X and Y axis:

def cropWhiteEdge(img) sumx = average(img over x) sumy = average(img over y) for v in sumx if v > 0.98 bound = v end end ... end

If the average brithness on one point on one axis is over 0.98 0.98 , we choose it as one side of the border. This process is repeated for each border on each channel, until we cropped out all the white edges in each image.

After this process, we get a imgage like this:

Cropped White Edges

We can observe that there are still colored edges. To detect these edges and crop them out, we will use the cross-channel difference to perform this detection. We compute the summed squared distance of all three channels, then we subtract the average SSD:

d = ( r g + g b + b r ) 2 d = (|r - g| + |g - b| + |b - r|)^2

e = m a x ( 0 , d d ˉ ) e = max(0, d - \bar{d})

Then if the summed error e e on an edge is greater than 0.5 0.5 , we decide that point on that edge is a border.

After cropping out the colored edges, we get this result:

Auto Crop

Final Results (after all the bells and whistles)