CS194-26 Project 1 Aligning Prokudin-Gorskii Glass Plates

Kevin Shi

Overview

Our goal is to solve this problem . To summarize this problem, we are given one image that contains three photographs of the same subject using a red, green, and blue filter and try to stack them to create one BGR image. We are assuming a simple translation of the images on top of each other is enough to produce the images.

Method

I first divided the image by seperating the pictures into three equal vertical parts. This gave me BGR matricies, which I then cropped off 15% of each edge, leaving me with 70% of the original picture. This cropping was to get rid of the black and white border on the outside as well as the artifacts on the edge of the picture. For almost all the pictures, the focus of the image was at/near the center of the image, so cropping left me with what was important to the human eye. To align, I displaced the image in a range of 30 pixels in both x and y direction, and exhaustively calculated the loss at every such displacement, choosing the displacement with the best loss. For the larger images, exhaustively searching proved much too expensive, so I implemented an image pyramid that downsized the image by a factor of 2 for each iteration. For loss I tried both Sum of Squared Differences (SSD) and normalized cross-correlation (NCC). Although most images turned out similar and NCC was a more expensive operation, there were a select few for which NCC did much better, such as workshop.tif- so I chose to use NCC. In my implementation, the first shift(X) is defined as downwards and the second shift(y) is defined as right, to keep consistent with matrix row by column definitions.

Naive Algorithm

The Naive algorithm uses the methods describe above, except that there is no image pyramid. It uses NCC and two for loops to search the 30x30 grid area and find the optimal loss and thus alignment.

Image Pyramid

The naive solution of exhaustive search became prohibitively expensive with the large resolution glass plate scans. So I built an image pyramid, where at each level the image is downsized by a factor of 2. I downsize until both images are smaller than 200 pixels in height, from which I search a 20x20 area. I then upscale back up, and at each step searching the 5x5 surrounding area around the previous fit to make sure that each fit was optimal.

Problems

When first beginning the naive algorithm, I had a lot problems. The image kept aligning to (0,0). Eventually I found out that the problem was that cathedral picture was too small, and the 10 pixel value border on that outside really affected the inside, so I cropped it out. Also, the border of the image had fat black borders and had different distortions, so I cropped the outside border.

When I did SSD, I had some problems with some pictures not aligned properly, such as emir and workshop. I tried NCC, which fixed workshop at the cost of a more computationally expensive loss function, but this did not fix emir.

The reason why emir was difficult for my algorithm was because the clothing he was wearing was bright blue. This lead to his blue and green images being relatively similar, thus being matched up, but the red image having a opposite value, meaning the algorithm tried very hard to not match the clothing up. If I had to improve this with the help of outside libraries, I would match the image gradients instead of raw pixel values.

I tried a myriad of interesting metrics.

Extras

Bells and Whistles Algorithm

This new algorithm uses gradient NCC as the metric. Specifically I used Sobel filters with a kernel size of 7 in both the X and Y direction, we can see that this improves most pictures, especially the ones with a lot of trouble before, such as lady and emir. This is because we no longer depend on the raw pixel values, which change between blue, green, and red, but now we depend on effectively the edges and shape of the image, which should stay the same/very similar between images of the same subject.

Simple Autocrop

My autocrop crops the image based on the shifting of the image. It crops so that it can keep the maximum information of the image but also get rid of the colored borders. The way the algorithm works is that it takes in the shifts and keeps all of the area where there exist all three channels and dicards the areas where one or two of the images are shifted off.

Besides, I think the images look better with all the colored borders, they tell the history of the image and give it a better atmosphere.

Results

Raw Pixels NCCSobel Gradient NCC

cathedral G(5, 1) R(12, 3)

cathedral G(5, 2) R(12, 3)

castle G(34, 0) R(95, 3)

castle G(34, 3) R(95, 4)

emir G(47, 23) R(130, -148)

emir G(50, 23) R(119, 50)

harvesters G(56, 17) R(122, 14)

harvesters G(60, 16) R(124, 12)

icon G(38, 16) R(89, 21)

icon G(40, 17) R(89, 23)

lady G(45, 6) R(85, 8)

lady G(56, 9) R(119, 13)

melons G(78, 8) R(172, 11)

melons G(80, 10) R(175, 13)

monastery G(-4, 0) R(3, 1)

monastery G(-3, 2) R(3, 2)

onion church G(51, 24) R(109, 36)

onion church G(51, 27) R(108, 36)

self portrait G(76, 29) R(175, 37)

self portrait G(80, 30) R(175, 37)

three generations G(48, 15) R(109, 12)

three generations G(54, 13) R(113, 10)

tobolsk G(1, 2) R(4, 3)

tobolsk G(3, 2) R(7, 3)

train G(41, 5) R(84, 30)

train G(41, 8) R(85, 32)

workshop G(53, -1) R(105, -13)

workshop G(50, -2) R(96, -14)

Images of My Choice

Raw Pixels NCCSobel Gradient NCC

windmills G(53, 14) R(125, 22)

windmills G(56, 15) R(125, 23)

coast G(43, 18) R(100, 24)

coast G(56, 17) R(127, 28)

family G(70, 34) R(145, 60)

family G(71, 38) R(147, 62)

Border removal

Original (Sobel NCC)Autocropped

cathedral

cathedral

castle

castle

harvesters

harvesters

family

family