CS194-26 Homework 1¶

Xinyang Geng

# Import some utilities
import matplotlib
from matplotlib import pyplot as plt
from skimage.io import imread
from skimage import img_as_float, img_as_ubyte
from hw1_utils import *

%matplotlib inline

matplotlib.rcParams['figure.figsize'] = (16, 6)

I. Background¶

Sergei Mikhailovich Prokudin-Gorskii was a Russian phogographer ahead of each time. From 1907 to 1915, he traveled around the Russian Empire, taking colored image of everything before colored photography was invented. He did so by taking three pictures of the same scene with red, green and blue filters. Later on, these images were purchased by the library of congress and made available online.

II. Image Alignment with Normalized Cross Correlation¶

The fundamental idea behind image alignment algorithm is to exhaustively shift one of the images and find the best match. To quantify how good a match is, we use a metric called normalized cross correlation. For two vector $v$ and $w$ , the normalized cross correlation is defined as the inner product of the normalized vectors:

N C C (v, w) = ⟨ \frac{v}{‖ v ‖}, \frac{w}{‖ w ‖} ⟩

$NCC(v, w) = \langle \frac{v}{\|v\|}, \frac{w}{\|w\|} \rangle$

Our naive algorithm is

$\mathbf{Algorithm} \textrm{ Align image}$
$\mathbf{Input}$
$\hspace{20mu} \textrm{ Image: } M_1,\, M_2$
$\hspace{20mu} \textrm{ Search range: } r$
$\mathbf{Procedure}$
$\hspace{20mu} offset\_x = 0$
$\hspace{20mu} offset\_y = 0$
$\hspace{20mu} best\_ncc = -1$
$\hspace{20mu} \mathbf{for} \; i = -r \;\textrm{to}\; r \; \mathbf{do}$
$\hspace{20mu} \hspace{20mu} \mathbf{for} \; j = -r \;\textrm{to}\; r \; \mathbf{do}$
$\hspace{20mu} \hspace{20mu} \hspace{20mu} \mathbf{if} \; NCC(M_1,\, shift(M_2,\, i,\, j)) > best\_ncc \;\mathbf{then}$
$\hspace{20mu} \hspace{20mu} \hspace{20mu} \hspace{20mu} best\_ncc = NCC(M_1,\, shift(M_2,\, i,\, j))$
$\hspace{20mu} \hspace{20mu} \hspace{20mu} \hspace{20mu} offset\_x = i$
$\hspace{20mu} \hspace{20mu} \hspace{20mu} \hspace{20mu} offset\_y = j$
$\hspace{20mu} \hspace{20mu} \hspace{20mu} \mathbf{end \; if}$
$\hspace{20mu} \hspace{20mu} \mathbf{end \; for}$
$\hspace{20mu} \mathbf{end \; for}$
$\mathbf{Output:} \; offset\_x,\; offset\_y$

Note that the images all contain black edges that does not include any alignment information, so we crop out 14% of the images before we compute normalized cross correlation.

Let's run our algorithm on small images

for img in read_small_images():
    colored_img, offsets = color_image(*img, offset_search=compute_best_offset)
    plot_images(colored_img)
    plt.text(-10, -10, "Offset G: {}, B: {}".format(*offsets), size='xx-large')

III. Accelerating Alignment with Image Pyramid¶

The main disadvantage of our naive algorithm is that it is slow to search exhausively, especially for large images. One method to improve the performance is to find the approximate best offset by searching on the smaller scale images. Here is an illustration from Wikipedia:

Image Pyramid

Each level we reduce the image size by a factor of 2, and we stop rescaling when we hit a 300 pixel limit. In the bottom level, we still search the offsets from -15 to 15. For all the levels above, we only search 12 pixels (from -6 to 6).

Let's run the algorithm on all images.

for img in read_all_images():
    colored_img, offsets = color_image(*img)
    plot_images(colored_img)
    plt.text(-10, -10, "Offset G: {}, B: {}".format(*offsets), size='xx-large')

IV. Image Alignment with Edge Feature¶

Sometimes it is not ideal to compute the alignment directly on raw image. Hence I've also tried algining images with edge features. Here we used the Canny edge detector provided in scikit-image. Let's visualize the edge features.

b, g, r = read_image('images/tobolsk.jpg')
plot_images(b, canny(b))

Now let's run our algorithm on all the images.

for img in read_all_images():
    colored_img, offsets = color_image(*img, align_filter=canny)
    plot_images(colored_img)
    plt.text(-10, -10, "Offset G: {}, B: {}".format(*offsets), size='xx-large')

V. Auto Cropping¶

We can see that all aligned images contain invaid edges. Here we propose an algorithm to automatically detect all the edges and crop them out. We notice that the edges correspond to large changes of pixel value along the axis, and this fact point to the direction of oriented gradients. The gradient of a image along a given axis can be computed as the convolution of a finite difference filter and the image.

Let's take a look at the gradient along the horizontal axis.

b, g, r = align_images(*read_image('images/tobolsk.jpg'))
plot_images(r)
plot_images(
    np.maximum(
        np.maximum(finite_difference_y(r), finite_difference_y(g)), 
        finite_difference_y(b)
    )
)

As we can see, the output is noisy. We can smooth it by applying a Gaussian filter. Let $*$ be the convolution operator. We know that

\frac{d}{d x} (f * g) = \frac{d f}{d x} * g

$\frac{d}{dx}(f * g) = \frac{df}{dx} * g$

Hence, we can first take the derivative of the Gaussian filter by convolving it with the finite difference filter, and then convolve the resulting filter with the original image. This result in the derivative of Gaussian filter.

Let's visualize it.

b, g, r = align_images(*read_image('images/tobolsk.jpg'))
plot_images(
    np.maximum(
        np.maximum(derivative_gaussian_y(r), derivative_gaussian_y(g)),
        derivative_gaussian_y(b)
    )
)

Now let's plot the mean edge response on the y axis. The edge signal looks very clear! We can threshold it to detect the edges. We take 60% of the center area of the image, and take the maximum of edge response as the baseline. Then we add 0.5 standard deviation of the edge response and set it as our threshold.

img, _ = color_image(*read_image('images/tobolsk.jpg'))
edge_response = edge_stat_y(img)

plt.plot(
        range(len(edge_response)), edge_response,
        range(len(edge_response)), np.ones_like(edge_response) * edge_response_threshold(edge_response)
)
plt.xlabel('Image horizontal coordinate')
plt.ylabel('Mean edge response')
pass

Now we run our auto cropping algorithm on all the colored images. We can see that in the last image, the algorithm over-estimated the edge.

for name in os.listdir('output'):
    img = skimage.io.imread(os.path.join('output', name))
    img = skimage.img_as_float(img)
    plt.figure()
    plt.subplot(1, 2, 1)
    plt.imshow(img)
    plt.subplot(1, 2, 2)
    plt.imshow(auto_crop_dg(img))

VI. Extra Images¶

Let's run our algorithm on some extra images

for img in read_all_images('extra_images'):
    colored_img, offsets = color_image(*img)
    plot_images(colored_img)
    plt.text(-10, -10, "Offset G: {}, B: {}".format(*offsets), size='xx-large')
    plot_images(auto_crop_dg(colored_img))

VII. Unsharp Masking¶

We notice some of the images is blury, so let's apply unsharp masking to make it sharp. Let $*$ be the convolution operator and $G$ be a Gaussian filter. Let M be the input image, the unsharp filtering calculate the output as:

u n s h a r p (M) = (M - G * M) + M

$unsharp(M) = (M - G * M) + M$

$M - G * M$ will give us the high frequency signal, and we add it back to the orignal image.

for img in read_all_images('extra_images'):
    colored_img, _ = color_image(*img)
    colored_img = auto_crop_dg(colored_img)
    plt.figure()
    plt.subplot(1, 2, 1)
    plt.imshow(colored_img)
    plt.subplot(1, 2, 2)
    plt.imshow(unsharp_mask_filter(colored_img, 5, 4))

VIII. Color Histogram Tranfer¶

Finally, we recolor the image by transfering the color histogram from another image. Here we transfer the RGB distribution of one image to another.

source = ['cathedral.jpg', 'cathedral.jpg', 'cathedral.jpg', 'cathedral.jpg']
tranfer = ['gits1.jpg', '5cm4.jpg', 'vangogh.jpg', '5cm2.jpg']


for i in range(len(source)):
    img = imread(os.path.join('output', source[i]))
    img = img_as_float(img)
    img = auto_crop_dg(img)
    img = img_as_ubyte(img)
    transfer_image = imread(os.path.join('transfer_images', tranfer[i]))
    
    plt.figure()
    plt.subplot(1, 3, 1)
    plt.imshow(img)
    plt.subplot(1, 3, 2)
    plt.imshow(transfer_image)
    
    plt.subplot(1, 3, 3)
    plt.imshow(img_as_float(histogram_transfer(img, transfer_image)))