CS194-26 FA21 Project 1: Colorizing the Prokudin-Gorskii photo collection

By Austin Patel

Overview

The goal of this project is to reconstruct colored images given black and white images taken with red, green, and blue filters. The images can be easily overlapped with one exactly on top of another, but the challenge is that this typically produces images where each channel is slightly offset. To correct this, I tried shifting each image channel within a certain x/y range to find the most aligned section. Similarity of alignment was achieved using normalized dot product. This approach works well for small images, but runs slowly on large images as the number of pixels scales up. To address this, an image pyramid approach was used which recursively applies the original alignment method to images that have been reduced in size. The idea is that we can get a rough idea of where the alignment should be from a lower resolution image and then do fine tuning at the highest resolution version.

My approach

The image pyramid approach worked quite well with almost all the images. One challenged I faced was that if there was a dramatic offset (for example the red y offset in the onion_church image was 108 pixels) then my alignment algorithm was unable to properly align the image since I was only looking in a 5px shift range in each direction. I resolved this by increasing the recursive depth of my image pyramid algorithm to 6 layers. At the deepest layers with the lower resolution images, a shift of 5 pixels corresponds to a 5*(2^6)=320 pixel offset at the full resolution scale. This means my algorithm will be able to align images up to a +/- 320 pixel offset for each of the color channels.
The following parameters were used to generate the images below:
- SHIFT_RANGE = 5 # range for how many pixels to shift in each direction (+/- x and +/- y) when aligning
- PYRAMID_SCALE = 1/2 # how much to scale each image's dimensions by for each recursive iteration of pyramid scaling
- PYRAMID_DEPTH = 6 # recursive depth for image pyramid. Each further recursive iteration has the image scaled by PYRAMID_SCALE
- DIFF_METHOD=dot_diff # flatten the two images we are comparing into 1D array and take the normalized dot product to compute the diff between images
- DIFF_BORDER=0.1 # proportion of image to cut off on each side when looking at diff (to ignore border; see bells and whistles for more details)

Bells and Whistles

Automatic cropping: My goal was to remove parts of the borders on the images after the alignment process occurs. To accomplish this I first averaged the three channels from the aligned image so that the image became a 2D matrix with values between 0 and 1. Then I selected either a single column or row (depending on which side I wanted to remove the border from) from the center of the matrix. Next I convolved a edge detector ([1,0,-1]) with the column/row to detect edges in this segment. Then I look from the edge of the photo inward an see if there is any edge above a certain threshold. If there is one within a reasonable section of the image (border will not end in the center of the image for example), then I crop the image to the edge location. The reasoning to look for edges to detect the border is that the border is usually close to fully black or fully white and there is a sharp edge between it and the actual contents of the image.
Time measurement: I included time deltas in seconds in the results below for how long it took to fully process and save each image on my local computer (2018 15" Macbook Pro).
Image diff calc from center of image: I noticed some of the images were still slightly offset even after doing image pyramid and playing with the parameters. I hypothesized that only looking at the center portion of the image (cut off 10% from each edge of the image) would help alignment since that would likely be enough to ignore the borders on the image. I implemented this feature to cut off 10% from each edge and only use that when determining the image diffs for each x/y shift. Unfortunately this improvement is present in both the before/after images below, so the benefit is not easy to see.

Results

The offsets in the results section indicate the (x, y) offset of the green (g) and red (r) channels relative to the blue channel.

File	Offsets & Run Notes	Extra Notes
onion_church	g: (27, 51) r: (36, 108) process time (seconds): 35.407472133636475
icon	g: (17, 41) r: (23, 89) process time (seconds): 37.71184492111206
train	g: (6, 43) r: (32, 87) process time (seconds): 36.435320138931274
melons	g: (11, 82) r: (13, 178) process time (seconds): 35.711519956588745
monastery	g: (2, -3) r: (2, 3) process time (seconds): 0.4854283332824707
landscape	g: (-2, 42) r: (9, 97) process time (seconds): 36.93493580818176	Additional image pulled from collection
flowers	g: (21, 75) r: (30, 156) process time (seconds): 36.06981706619263	Additional image pulled from collection
shell	g: (5, 25) r: (7, 111) process time (seconds): 36.273571729660034	Additional image pulled from collection
three_generations	g: (14, 53) r: (11, 112) process time (seconds): 35.569983959198
church	g: (4, 25) r: (-4, 58) process time (seconds): 34.13655710220337
workshop	g: (0, 53) r: (-12, 105) process time (seconds): 35.514161109924316
tobolsk	g: (3, 3) r: (3, 6) process time (seconds): 0.55177903175354
emir	g: (24, 49) r: (-209, 269) process time (seconds): 35.1381938457489	This one did not align well with my process algorithm. I believe this happened because the three different color channels in the original image varied greatly in overall brightness. Because of this, even if channels were properly aligned, there would still be a large difference between the images. The red channel offset that was converged on must have had a larger dot product with the blue channel (the reference channel) at this strange offset compared to the aligned location. Note: order of brightness from low to high was blue, green, then red
lady	g: (8, 55) r: (11, 117) process time (seconds): 37.02691984176636
cathedral	g: (2, 5) r: (3, 12) process time (seconds): 0.49702978134155273
harvesters	g: (17, 60) r: (13, 124) process time (seconds): 34.55814719200134
self_portrait	g: (29, 79) r: (37, 176) process time (seconds): 35.488484144210815