CS194-26 Project 1

Part 1: Exhaustive Search

This approach takes less than 5 seconds for low-resolution jpeg images.

The details of this approach are below:

Load the input image, and divide it three equal parts b, g, r.
Crop the borders of b, g, r to get cropped_b, cropped_g, and cropped_r.
Using L2 distance as cost function and using [-15, 15] as the range for both horizontal and vertical directions, exhaustively search for the optimal translation which results in the minimum cost for cropped_b (base) and cropped_g.
Using L2 distance as cost function and using [-15, 15] as the range for both horizontal and vertical directions, exhaustively search for the optimal translation which results in the minimum cost for cropped_b (base) and cropped_r.
Apply the optimal translations to the original g and r.
Stack the three color channels, and output the image.

Problems encountered:

Low-resolution jpeg images were not well aligned. After some investigation, I noticed that my initial implementation did not crop off enough borders. I increased the cutoff from 10px to 10% of width/height on each side.

Color images:

Blue translation: (0, 0); Green translation: (2, 5); Red translation: (3, 12)

Blue translation: (0, 0); Green translation: (2, -3); Red translation: (2, 3)

Blue translation: (0, 0); Green translation: (3, 3); Red translation: (3, 6)

Part 2: Image Pyramid

This approach takes around 40 seconds for high-resolution tif images.

The details of this approach are below:

Load the input image, and divide it three equal parts b, g, r.
Crop the borders of b, g, r to get cropped_b, cropped_g, and cropped_r.
Using L2 distance as cost function, apply image pyramid recursively with scale of 4. In the base case where the height and the width of the image are both less than 500px, apply exhaustive search with range [-10, 10] for both horizontal and vertical directions, and return the results. Otherwise, downscale the image by 4, recursively apply image pyramid to get downsized_tx, and downsized_ty, translate cropped_g by 4 * downsized_tx horizontally and 4 * downsized_ty vertically, apply exhausted search on current level to cropped_b and trans_g, and get optimal translations tx and ty. Return (4 * downsized_tx + tx, 4 * downsized_ty + ty).
Using L2 distance as cost function, apply image pyramid recursively with scale of 4. In the base case where the height and the width of the image are both less than 500px, apply exhaustive search with range [-10, 10] for both horizontal and vertical directions, and return the results. Otherwise, downscale the image by 4, recursively apply image pyramid to get downsized_tx, and downsized_ty, translate cropped_r by 4 * downsized_tx horizontally and 4 * downsized_ty vertically, apply exhausted search on current level to cropped_b and trans_r, and get optimal translations tx and ty. Return (4 * downsized_tx + tx, 4 * downsized_ty + ty).
Apply the optimal translations to the original g and r.
Stack the three color channels, and output the image.

Problems encountered:

My initial implementation of image pyramid used scale = 2 and range = [-15, 15]. However, that ended up taking around 90s to process a single tif image. Therefore, I made the hyperparameters more coarse by setting scale = 4 and range = [-10, 10]. Although the current implementation is more coarse, in practice it produced very similar output compared to initial implementation.

Color images:

Blue translation: (0, 0); Green translation: (4, 25); Red translation: (-4, 58)

Blue translation: (0, 0); Green translation: (24, 49); Red translation: (-205, 99)

My algorithm failed to align emir.jpeg because the three color channels have different brightness. I tried normalizing each color channel and then running image pyramid, but it still yielded the same result.

Blue translation: (0, 0); Green translation: (16, 59); Red translation: (13, 124)

Blue translation: (0, 0); Green translation: (17, 41); Red translation: (23, 89)

Blue translation: (0, 0); Green translation: (9, 51); Red translation: (11, 112)

Blue translation: (0, 0); Green translation: (10, 81); Red translation: (13, 178)

Blue translation: (0, 0); Green translation: (26, 51); Red translation: (36, 108)

Blue translation: (0, 0); Green translation: (29, 78); Red translation: (37, 176)

Blue translation: (0, 0); Green translation: (14, 53); Red translation: (11, 112)

Blue translation: (0, 0); Green translation: (5, 42); Red translation: (32, 87)

Blue translation: (0, 0); Green translation: (0, 53); Red translation: (-12, 105)

CS194-26 Project 1 - Yin Deng

Part 0: Overview

Part 1: Exhaustive Search

Part 2: Image Pyramid

Part 3: Newly Downloadewd Images