CS194 Project #1: Images Of The Russian Empire

By: Scott Shao

Background

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) took many photos of the Russian Empire with three exposures of every scene onto a glass plate using a red, a green, and a blue filter. These glass plate negatives survived and were purchased, digitized, and made available on-line by the Library of Congress.

Overview

The goal of this project was automatically produce a color image with as few visual artifacts as possible using image processing techniques from the digitized Prokudin-Gorskii glass plate negatives. I need to extract the three color channel images, place them on top of each other, and align them so that they form a single RGB color image.

Method

My program first automatically crop the white and black borders off the images by simply find the means of each row and column of the matrix and use threshold to find the borders and crop the image automatically to saparate the image into three color channels. The loss function used to determine the level of alignment was sum of squared differences (SSD). During the alignment phase I cropped all images to the central 60% to account for artifacts at the edge and the rolling. For the small images such as JPEG, I simply searched the whole image exaustively over a displacement range of 40 pxiels to minimize the loss function. For the larger image such as TIFF, it is time prohibitive to do such operation so I used an image pyramid algorithm to find the optimal offset with a resonable time.

There are some differences between my implementation and the starter code provided. I used the red channel as the base instead of the blue channel and I implemented an automatic border cropping to all of my images before any operations so the optimal offset values I calculated were based on the cropped red channel instead of the raw red channel as suggested in the starter code. So simply apply the offset mentioned in this website won't reproduce my exact results.

Naive Algorithm

For the naive algorithm, after the border cropping and color channel separation, only the central 60% of the images were used to find the optimal offset. This was to eliminate the effects of remaining borders and the rolled images at the edges affects minimizing the loss function. The loss function of the choice was SSD, although normalized cross-correlation (NCC), canny edges, and tuning gamma values of the images were also explored. The algorithm used two for loops to find the optimal offset from negative bound to positive bound pixel-wise to minimizing the loss function thus the optimal offsets. This naive algorithm was used for the small JPEG images with bound equals to 30.

The image on the left was produced using the naive algorithm.

Image Pyramid Algorithm

It is prohibitively expensive to use the naive algorithm on the larger images. One way to increase the speed was to use the image pyramid, where I rescale the image by different factors to produce a pyramid of of the same image with different resolutions. I then used the naive search algorithm with a smaller displacement range to minimize the loss function on the most coarse image and apply the optimal offset to the next level on the pyramid before apply the naive algorithm again until the original resolution image was achieved or the optimal offset was zero. If at any coarse level the optimal offsets at that level was zero, or the orignal image resolution was achieved, the algorithm stops to save computation time, and the optimal offsets was returned.

I used two functions, pyramid_helper and pyramid. The pyramid_helper takes in the color channels, the scale factor, and the bound and rescale the color channels and find the optimal offsets at that coarse level. The pyramid function handles the generation of the pyramid of images with different coarse levels and iteratively calls the pyramid_helper function and applies the optimal offsets to the image before entering the next level. The pyramid function returns the final optimal offsets for the orginal color channels.

Both image pyramid algorithm and naive algorithm were called using test_pyramid function. The test_pyramid function takes in the name of the file, the scale factor for the image pyramid, the bound of the search pixels, and optinally can show color channels and filters or saves the final image. The scale factor determines the level of the pyramid, if scale equals 1, it is a naive algorithm.

The image on the right was produced using the image pyramid algorithm with scale of 8 and bound of 4.

Results

cathedral
G: [5, -1]
B: [12, -3]
emir
G: [36, -20]
B: [80, -57]
harvesters
G: [40, 4]
B: [78, -16]
icon
G: [56, -8]
B: [114, -24]
lady
G: [22, -5]
B: [50, -13]
melons
G: [22, -2]
B: [57, -15]
monastery
G: [3, -1]
B: [15, -2]
onion_church
G: [38, -9]
B: [82, -38]
self_portrait
G: [41, -8]
B: [99, -37]
three_generations
G: [29, 2]
B: [68, -11]
tobolsk
G: [5, -1]
B: [11, -3]
train
G: [49, -25]
B: [96, -32]
village
G: [41, -9]
B: [93, -23]
workshop
G: [11, 10]
B: [24, 14]

Extras

irrigation_canal
G: [28, -5]
B: [63, -24]
lugano
G: [17, -24]
B: [49, -37]
mill
G: [32, -8]
B: [57, -32]
camel
G: [24, 16]
B: [57, 17]
prison
G: [25, 8]
B: [74, 20]

Extras

Improved Contrast And Canny Edges Detection

The idea behind the improved contrast and adding Canny edges detection algorithm was the add additional features to the image channels so that it will produce a better alignment faster. After running the program on all the sample images the "Emir" shown below had a difficult time to align perfectly partially due to darastic color brightness differences between channels, i.e. the blue coat he was wearing. I found out that if I just crop the top left corder for alignment I would get a much better results than the current image pyramid algorithm. However, an universal program without much human tweaking was desired so I decided to optionally adding enhanced contrast and Canny edges to the orignal image channels. These added features do help with the alignments as shown below.

An output image without enhanced contrast and Canny edges
An output image with enhanced contrast and Canny edges
Sample image green channel
Sample image green channel with gamma = 2
Sample image green channel Canny edges with sigma = 3
Sample image green channel after operation

Border Cropping

Border cropping was the first operation done to the image prior to any color channel seperation and alignments. This was based on the idea that each images had a different size of white and black borders and that will affects the alignment of images since the algorithm had to work harder to compensate the displacement incurred by the borders.

The algorithm was fairly simple by computing the mean of each row and column of the matrix and threashold the values to find the white and black borders. This works surprisingly well as shown below. Better edges detection algorithms were considered but due to time constraints and complexity this naive algorithm was preferred.

I have also considered auto seperation of the color channels based on the similar threasholding idea to get rid of the borders completely on each color channels, but that only works well on images with well defined black gaps between color channels as shown below and thus ultimately was not used.

An uncropped image
An auto-cropped image

Conclusion

This is a really fun and engaging project that familiarize myself with image processing in python. The result generated with my program was good and it takes less than a second to generation the output for the small images and around 20 seconds for the large images. The optional contrast enhancement and Canny edges additonal features really helps with the alignments in some cases. The naive automatic borders detection and removal worked well.