The colorization program is written in Python 3 and requires a recent version of
main.py [--crop] [--contrast] filenames
The goal of plate alignment is to find the correct
(x, y) displacement vectors to overlay the green and red channels over the blue channel. We compute the alignments for each color separately.
Alignment works by first pre-processing the image to remove edges from consideration. 10% of the margins are removed so that borders and artifacts are removed from the picture.
Then, we search within a specified region for the best possible displacement vector, where we define the heuristic based on normalized cross-correlation, which takes the inner product between the two normalized vector representations of each green and blue or red and blue images. The higher the cross-correlation, the better the alignment.
Once the alignment is found, we return a new image that represents the final product.
Standard alignment using only normalized cross-correlation on pixel intensities is actually fairly successful with most of the input images. Some images, like
emir, however, exhibit alignment issues as the natural light that is captured by the camera differs depending on the color of the original subject.
The difference in brightness between
emir's blue and red filters creates a false positive for the normalized cross-correlation.
On small images, the simple alignment algorithm is fully capable of searching across a large range of displacement vectors very quickly. However, with larger images, even a small search range can be prohibitively expensive, so we implemented an image pyramid to reduce the search range and improve runtime.
Image pyramids refer to a very general strategy for how we can rescale images and work with smaller copies of the same image to speed up execution. Let's put it into perspective. In a 400x400 image thumbnail, it takes only a few seconds to find the best alignment when scanning across the range of possible (x, y) displacements in (-15, +15).
But computing the same (-15, +15) square of possible displacements over the original 4000x4000 pixel scans takes over 4 minutes on a fast machine! And this is without yielding a good result because a displacement of 15 pixels represents a shift of less than half a percent, not enough to correct the image alignment.
Our image pyramind algorithm first recurses down to a thumbnail-sized image, solves the optimal alignment problem for the thumbnail copy, and then returns the displacement vector back to the caller before attempting to find the best alignment for larger images. This allows the larger image to pick up where the processing for the smaller image stopped and take advantage of the incremental progress made on the smaller image.
We found that, on the
emir of Bukhara, the large differences in image intensity caused issues for the normalized cross-correlation.
To remedy the situation, we added an additional step to the alignment pre-processing stage. In addition to edge removal, we also added a Sobel filter to identify the gradients and edges in the image. The Sobel filter returns a black-and-white image containing the most relevant information for alignment: the edges.
We chose the Sobel filter over the Canny edge detection filter as it provided better performance without sacrificing on results.
To enhance the visual appeal of the images, we added a slight contrast adjustment which boosted the contrast of the images by clipping black and white to the 2nd and 98th percentiles. This provided a slight but noticeable increase in contrast without introducing artifacts or blowing out the image.
The most creative component of our implementation is in the automatic cropping. Inspired by the edge detection filter strategy, we researched additional methods for understanding and interpreting edges in an image before stumbling upon the Hough Transform.
The Hough transform in its simplest form is a method to detect straight lines .
We used the Probabilistic Hough Transform to identify the cropping boundaries. A line is part of the crop width if it's the case that the line is either somewhat vertical or somewhat horizontal, as determined by its start and end coordinates, and within 10% of the image margin.
This yielded great results on most images. But a few images were over-cropped due to the line detector picking up features of the scene and treating them as borders.
Large images were scaled down prior to applying the transform to improve performance.
In 2000, the Library of Congress commissioned WalterStudio to professionally retouch a representative set of 123 plates from the Prokudin-Gorskii collection. We’ve selectively rendered and reproduced a few of the photos to offer a comparison between our automated alignment program and the professionally retouched photos.