CS194-26 Project 1 - Yin Deng

Part 0: Overview

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a chemist and photographer of the Russian Empire. He travelled around the country and photographed everything. In order to take color pictures, he recorded three exposures of every scene onto a glass plate using red, green, and blue filters. Although he was limited by the technologies of his time, in this project we will try to generate color images based on his work. We assume that a simple x, y translation model is sufficient to properly align the red, green, and blue color channels.

Part 1: Exhaustive Search

This approach takes less than 5 seconds for low-resolution jpeg images.

The details of this approach are below:

  1. Load the input image, and divide it three equal parts b, g, r.
  2. Crop the borders of b, g, r to get cropped_b, cropped_g, and cropped_r.
  3. Using L2 distance as cost function and using [-15, 15] as the range for both horizontal and vertical directions, exhaustively search for the optimal translation which results in the minimum cost for cropped_b (base) and cropped_g.
  4. Using L2 distance as cost function and using [-15, 15] as the range for both horizontal and vertical directions, exhaustively search for the optimal translation which results in the minimum cost for cropped_b (base) and cropped_r.
  5. Apply the optimal translations to the original g and r.
  6. Stack the three color channels, and output the image.

Problems encountered:

  1. Low-resolution jpeg images were not well aligned. After some investigation, I noticed that my initial implementation did not crop off enough borders. I increased the cutoff from 10px to 10% of width/height on each side.

Color images:

Part 2: Image Pyramid

This approach takes around 40 seconds for high-resolution tif images.

The details of this approach are below:

  1. Load the input image, and divide it three equal parts b, g, r.
  2. Crop the borders of b, g, r to get cropped_b, cropped_g, and cropped_r.
  3. Using L2 distance as cost function, apply image pyramid recursively with scale of 4. In the base case where the height and the width of the image are both less than 500px, apply exhaustive search with range [-10, 10] for both horizontal and vertical directions, and return the results. Otherwise, downscale the image by 4, recursively apply image pyramid to get downsized_tx, and downsized_ty, translate cropped_g by 4 * downsized_tx horizontally and 4 * downsized_ty vertically, apply exhausted search on current level to cropped_b and trans_g, and get optimal translations tx and ty. Return (4 * downsized_tx + tx, 4 * downsized_ty + ty).
  4. Using L2 distance as cost function, apply image pyramid recursively with scale of 4. In the base case where the height and the width of the image are both less than 500px, apply exhaustive search with range [-10, 10] for both horizontal and vertical directions, and return the results. Otherwise, downscale the image by 4, recursively apply image pyramid to get downsized_tx, and downsized_ty, translate cropped_r by 4 * downsized_tx horizontally and 4 * downsized_ty vertically, apply exhausted search on current level to cropped_b and trans_r, and get optimal translations tx and ty. Return (4 * downsized_tx + tx, 4 * downsized_ty + ty).
  5. Apply the optimal translations to the original g and r.
  6. Stack the three color channels, and output the image.

Problems encountered:

  1. My initial implementation of image pyramid used scale = 2 and range = [-15, 15]. However, that ended up taking around 90s to process a single tif image. Therefore, I made the hyperparameters more coarse by setting scale = 4 and range = [-10, 10]. Although the current implementation is more coarse, in practice it produced very similar output compared to initial implementation.

Color images:

Part 3: Newly Downloadewd Images

Color images: