CS194 Project 1

Images of the Russian Empire
by Vanessa Ng

Goal: Take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible.

Three separated color channels (R,G,B)

Image of combined color channels


This is done via extracting the three color channel images (above left image), placing them on top of each other, and aligning them so that they form a single RGB color image (above right image).

Part 1: Single-Scale Implementation

Approach

1. Exhaustive search: Using low-res images of ~300 by 300 pixels, an exhaustive search over a window of possible displacements (starting with [-15, 15]) was conducted.

2. Image matching metric: Each displacement was scored with the original blue color channel image using Sum of Squared Differences (SSD). The red/green color channel displaced image with the lowest score was taken as the best alignment match.

Results

Results of image alignment shift are displayed in the following format:

image_name.jpg
g: (row_shift, col_shift), r: (row_shift, col_shift)

Original

Aligned

cathedral.jpg
g: (1, -1), r: (7, -1)

Original

Aligned

settlers.jpg
g: (7, 0), r: (14, -1)

Original

Aligned

nativity.jpg
g: (3, 1), r: (7, 1)


Original

Aligned

empire.jpg
g: (2, 0), r: (7, 1)


These images only worked with 10% margin image cropping due to the border bleeds affecting the image matching metric.

Original (before cropping)

"Aligned" (before cropping)

Original (after cropping)

Aligned (after cropping)

.
monastery.jpg
g: (-3, 2), r: (3, 2)

Original (after cropping)

Aligned (after cropping)

belozersk.jpg
g: (4, 2), r: (10, 3)


Part 2: Multi-Scale Pyramid Implementation

Exhaustive search will become prohibitively expensive if the pixel displacement is too large, which is the case for the high-resolution glass plate scans (~3000x3000 pixels). As such, we need to implement a faster search procedure such as an image pyramid. An image pyramid represents the image at multiple scales and the processing is done sequentially starting from the coarsest scale (smallest image) and going down the pyramid, updating the estimate as it goes.


Approach

1. Downscale image: I scaled down the huge .tif images by a factor, f, using skimage.transform.downscale_local_mean(). I started with f = 20 so that the 3000px images would scale down to around 150-200px.

2. Modified exhaustive search: With the scaled down images, I did a search over possible displacements of radius, rad.

3. Image alignment metric: Similar to the naive implementation, each displacement was scored with the original blue color channel image using SSD. The red/green color channel displaced image with the lowest score was taken as the best alignment match.

4. Updating actual image: The radius, rad, was taken and multipled by f. This gives the actual row_shift and col_shift on the original 3000px image. The actual image was updated with shift of rad*f amount.

5. Updating factors: Next, f and rad are updated by dividing them by their respective factors: factor_f = 2 and factor_rad = 2. We half the variable, f, because we would like to increase the pixel density of the image we search over. Hence we downscale the image by a smaller magnitude. At the same time, with a larger image, we cannot afford to exhaustively search over such a large radius. Hence we half the rad.

6. Recurse: The above steps are repeated until f == 1, which is when we are manipulating the full 3000px image.

I started with f = 20 (downscaling images by a factor of 20). I computed the initial rad based on f and the image height so as to ensure I have a radius that accounts for the image size (and not larger than it).

Results

Each image took around 20-30 seconds to run. Overall, they worked pretty nicely for more most of the images.

Original

Aligned

lady.tif
g: (57, -6), r: (123, -17)

Original

Aligned

three_generations.tif
g: (51, 5), r: (108, 7)

Original

Aligned

icon.tif
g: (42, 16), r: (89, 22)


The following images only worked after cropping 10% margins off the original color channel images, followed by applying multi-scale pyramid algorithm.

Original

Aligned

.
self_portrait.tif
g: (78, 29), r: (176, 37)

Original

Aligned

turkmen.tif
g: (56, 21), r: (116, 28)

Original

"Aligned"

.
village.tif
g: (64, 12), r: (137, 22)

Original

"Aligned"

.
train.tif

g: (42, 5), r: (86, 32)

Original

"Aligned"

harvesters.tif
g: (59, 16), r: (123, 13)

Original

"Aligned"

shore.tif
g: (25, -16), r: (99, -32)

Original

"Aligned"

.
door.tif
g: (74, 32), r: (159, 50)

Problems

Emir.tif did not work despite applying the 10% image crop. This is because the images to be matched do not actually have the same brightness values (they are different color channels).


Original

"Aligned"

Edge Detection using Sobel Operator (Extra Credit)

Up to 2 pts: Better features. Instead of aligning based on RGB similarity, try using gradients or edges.

Background of Sobel Operator

The Sobel operator or Sobel filter, is used in image processing and computer vision, particularly within edge detection algorithms where it creates an image emphasising edges. It is a discrete differentiation operator, computing an approximation of the gradient of the image intensity function. At each point in the image, the result of the Sobel–Feldman operator is either the corresponding gradient vector or the norm of this vector. The Sobel–Feldman operator is based on convolving the image with a small, separable, and integer-valued filter in the horizontal and vertical directions and is therefore relatively inexpensive in terms of computations. On the other hand, the gradient approximation that it produces is relatively crude, in particular for high-frequency variations in the image.
[Taken from Wikipedia]

Approach

Before applying the multi-scale pyramid algorithm, we use sobel operator to pre-process the original image.

1. Sobel operator: Two 3x3 kernels were created shown below:



G_x and G_y are the resulting images when A (the source image) is convolved (denoted by * operation) with the kernel. Edges have high gradients as they experience a sharp change in pixel brightness. Non-edges would have low gradients. As such, we use the kernels account for the horizontal and vertical derrivative approximations.

2. Multi-scale pyramid: These processed color channel images are then passed through the multi-scale pyramid algorithm. Image matching metric is computed with the processed color channels.

3. Aligning the image: Taking the ideal row_shifts and col_shifts for green and red color channels, we np.roll() the original image (not the one processed with sobel) and display it.

Results

Emir.tif finally aligns after employing edge detection using Sobel operator!

Original

Aligned

.
emir.tif
g: (49, 24), r: (105, 41)

Automatic Contrasting (Extra Credit)

Up to 2pts: It is usually safe to rescale image intensities such that the darkest pixel is zero (on its darkest color channel) and the brightest pixel is 1 (on its brightest color channel). More drastic or non-linear mappings may improve perceived image quality.

Approach

Auto contrasting was applied after images were generated from mutitscale pyramid / naive algorithm.

1. Find the original median of the image's pixel intensities
2. Multiply the image matrix by 1.5x (or some factor)
3. Shift the matrix elements such that the new median stays in the same place as original median
4. The values which are less than -1 or greater than 1 get clipped to -1 and 1 respectively
This approach increases distance between pixel intensities, allowing the image to appear more contrasted.

Results

Automatic contrasting applied

Automatic contrasting applied