CS 194-26 Project 1¶

Abhijay Bhatnagar¶

Background¶

We are given individual glass panes from the Prokudin-Gorskii archives, and our task is to combine the three panels corresponding to blue, green, and red into one colored image. The challenging part of this assignment really is the alignment and processing. Each panel is not entirely centered, and they additionally contain different levels of brightness and contrast.

Naive Implementation¶

For my naive approach, I went with a straightword pyramid alignment. The aignment procedure from comparing one panel to a base panel and in our cases we used Blue panel as the base. For instance, in aligning the G to B, I preselected a reasonable displacement window of +/- 30 pixels, and analyzed the similarity between the two images. I explored using both cross-correlation and sum of squared distances, and I found the latter to provide better results. For the smaller JPG images this was sufficient, but for the larger TIFs, we had to employ the use of image pyramids.

Image pyramids were a straight forward concept, but a little tricky to implement. We effectively just downscale the image several times, and process each one from the lowest resolution to the highest. This allows us to avoid exhaustive searches on the larger resolutions, as we only need to search the level of detail missed by lower resolution (which is a much smaller dimensional search). We keep track of the displacements in each resolution, and scale accordingly when we go to higher resolution images in the pyramid, and search from there.

An important factor is that on each image and each resolution, we only do alignment on the inner ~80% of the image in order to avoid edge artifacts.

For my naive images, I additionally automatically cropped each image to the inner 80%, to avoid the artifacts on the edges. There are more sophistacted cropping algorithms possible, but this served sufficient for a naive approach. Each image is additionally presented with the red and green shifts found by my naive approach

Required Images¶

workshop.tif: R [53, 0], G [105, -12]

emir.tif: R [49, 24], G [96, -247]

monastery.jpg: R [-3, 2], G [3, 2]

three_generations.tif: R [53, 14], G [112, 11]

castle.tif: R [34, 3], G [98, 4]

melons.tif: R [81, 10], G [178, 13]

onion_church.tif: R [51, 26], G [108, 36]

train.tif: R [42, 5], G [87, 32]

tobolsk.jpg: R [3, 3], G [6, 3]

icon.tif: R [41, 17], G [89, 23]

Extra Images¶

00917v.jpg: R [2, 1], G [6, 1]

00891r.jpg: R [2, 0], G [5, 1]

00907v.jpg: R [2, 0], G [6, 0]

01756u.tif: R [58, -13], G [131, -38]

00952v.jpg: R [2, -2], G [7, -4]

00904v.jpg: R [2, 2], G [6, 3]

00893v.jpg: R [1, 2], G [3, 2]

Naive Analysis¶

Interestingly, the naive approach worked really well for each image, except for the emir. To understand why that particular case failed, we can look at the individual channels of the image to get a better sense.

Here you can clearly see in each panel, the colors filtered by the glass panes result in very different colors captured for the emir. Additionally, the brightness is very different between the different images. This will require an additional feature to properly match this.

Bells & Whistles Feature 1: Edge Detection¶

To rectify the color and brightness imbalance in the emir photo, I decided to incorporate edge detection as a feature to align on. I used Canny edge detection within sklearn, and I additionally swapped back to cross-correlation for my loss function, primarily because it produced a slightly better image.

New Emir¶

<matplotlib.image.AxesImage at 0x12f095550>

emir.tif: R [49, 23], G [107, 40]