Alex Jiang

CS 194-26: Project 1

Images of the Russian Empire

Approach

        The most natural way to approach the merging of the three colors is via matrix manipulation and the use of numpy accordingly. In the single-scale version of colorize, I initialized as per the skeleton (separating color channels, etc.) and then aligned the images by rolling the green and red matrices repeatedly to align with blue, as detected via SSD, which was chosen due to both simplicity and the success I had with it early on. The pyramid form, for larger images, was done via a large iterative loop (essentially recursion) in which we perform the single-colorize procedure on smaller image versions multiple times before finalizing a larger one. For the sake of convenience, the image is set to save directly from TIF to JPG (and appropriately compressed), since output files were hitting triple-digits in MB, but it still works either way and this can be changed easily.

Difficulties

        Many of the difficulties I had throughout were solved eventually via Bells and Whistles, especially some severe alignment issues with a couple TIFs like emir.tif (improvement can be seen below). The main persisting problem, that was only partially solved, was my algorithms’ ability to handle colors that didn’t vary too much; monastery.jpg is a severe outlier in how hard it was to process, due to a lot of the content sharing a similar light blue/white. My alignment and tweaking methods heavily rely on being able to differentiate color magnitudes, and monastery was hard to land without being at the expense of other images. Cropping was also significantly more difficult on cloud-heavy images, as my once-thought perfect threshold cut off 70% of church.tif. Overall, however, I am very satisfied with the final products and am proud to say I was able to overcome most of the trials along the way.

Bells and Whistles

        Three primary forms of additional functions were added, all labeled with [BELLS AND WHISTLES] in code commentary. The first two were chosen on the basis of being intuitive to implement (and also being listed first in the suggested improvements), while the use of the Sobel filter was chosen after noticing the effect it had on many of my peers’ results. The Canny filter was not tested, as the Sobel edge-detection was satisfactory.

Auto-Cropping

        Upon completion of dstack, when we have our colored image, the matrix is scanned for any rows or columns that are deemed “empty,” as defined by having a standard deviation of values of less than 0.1 across said row/column. This easily eliminates whitespace, and can clear some remnants of the color filters (e.g. at the top of cathedral.jpg and emir.tif, seen below). What this does fail to do, most notably, is remove black bars that are also polluted with some color (seen on the sides of those same images), as those still have sufficient color variance to avoid detection. Because of the chosen formula, loosening the requirements for being “empty” usually ends up deleting parts of the actual image before targeting the black bars. It is best at removing white and solid standalone colors, like remnants of cyan and magenta. Manual cropping (10% image margins) was also used at the beginning of the pyramid method because the extra image space was causing problems in the alignment calculations, but auto-cropping is still performed again at the end to the same effect.

        A clear potential downside of this method is for very, very single-color-dominated images (e.g. a picture of a piece of paper on a solid-colored table), but for real photos of 3D space, color typically varies enough to avoid cropping picture content. With higher STD thresholds, my algorithm was starting to cut out clouds from some examples. To my knowledge, with my selected threshold, this issue does not occur with any images from the Prokudin-Gorskii photo collection.

Auto-Contrasting

        Due to the photography method and possible human error on my part, some of the pictures come out a bit washed out and lack energy. To make them more vivid, a basic auto-contrasting scheme was set up, using np.percentile to isolate the adequately vivid pixels. The entire image contrast is boosted first, and then the pixels that were good enough get moved back towards their initial color. The contrast can be argued to make some of the photos look less real (the blue in emir.tif is almost too blue), but personally I think it makes the colors more appealing and makes the photos look more alive.

Sobel Edge Detection

        Some of the images, like emir.tif, had abnormally-poor output results that were honestly a little baffling to me. I implemented edge detection to give these ones a push in the right direction, which, in some cases, completely salvaged the image quality. I used Wikipedia’s basic outline of the Sobel operator, with the following matrix:

Convolve2d was utilized heavily to perform this aspect of the algorithm.

Unlike my other bells and whistles, edge detection is done immediately to make alignment easier later on.

Examples

        All of the pictures I used are displayed below, labeled by file name. Most were the required images, while some are those I added via the collection myself. Green and red offsets are listed below filename (e.g. “Green” = Green Offset). “Before” notates the barebones results without any cropping, contrasting, or sobel filtering, while “After” includes all of the above.

Before

After

cathedral.jpg

Green: (1,-1)

Red: (9,1)

monastery.jpg

Green: (-6,0)

Red: (9,1)

nativity.jpg

Green: (3,1)

Red: (7,1)

house.jpg

Green: (2,0)

Red: (7,1)

river.jpg

Green: (1,1)

Red: (7,1)

settlers.jpg

Green: (7,0)

Red: (14,-1)

church.tif

Green: (0,9)

Red: (26,8)

emir.tif

Green: (49,24)

Red: (105,41)

harvesters.tif

Green: (56,11)

Red: (119,10)

icon.tif

Green: (39,16)

Red: (89,23)

lady.tif

Green: (57,9)

Red: (121,13)

self_portrait.tif

Green: (74,25)

Red: (175,37)

shack.tif

Green: (8,23)

Red: (37,43)

three_generations.tif

Green: (59,15)

Red: (115,11)

train.tif

Green: (40,-2)

Red: (85,28)

turkmen.tif

Green: (65,11)

Red: (138,22)

village.tif

Green: (65,11)

Red: (138,22)