CS194-26 Project 1
Michael Weymouth
(cs194-26-adc)
Overview
In
this project, I was tasked with aligning the 3 color channels (R, G, and B) of
several images from the Prokudin-Gorskii
collection, then reassembling the channels into the proper color
representation of the original scene.
This
process was completed in the following way:
1.
The
image file is first horizontally split into 3 equal segments, representing the three color channels B, G, and R, respectively.
2.
Both
the R and G channels are aligned with the B channel as follows.
3.
First,
it demeans both channels by subtracting out the average pixel value in each
color channel from every pixel in that channel. Then, it takes the absolute
value of every pixel in both channels. Instead of relying on raw pixel values, the
algorithm is now aligning to a metric of “difference” from the mean value of
the channel. Since the different color channels will have different values for
different objects, this allows the algorithm to instead use an object-based
representation for alignment, which proved to be a better metric overall.
4.
It
then passes both images to a recursive pyramid search function, which
recursively calls itself to downsample the image by a
factor of 2 until it hits the base case: the width of the channel to align is
less than 400 pixels wide. It then runs a displacement search over a range of
[-20, 20], a parameter I determined experimentally.
5.
This
displacement search translates the channel being aligned in both X and Y
directions by a varying amount in the determined range, feeds the two channels
being compared into a ranking metric, then returns the translation with the
minimum value of the metric.
6.
The
best metric I found in my experimentation works as such: first, it crops out
the outer 80% of the image to get rid of the borders which negatively impacted
the alignment metric. Then, it flattens both channels to a 1-dimensional array
and normalizes them, padding with zeros if there is a dimension mismatch. Finally,
the metric returns the negative of the dot product of the two normalized
vectors.
7.
The
optimal found parameters are then passed up the stack, rescaled to represent
the shift necessary for the higher-resolution image, and the image is displaced
by those found translation parameters.
8.
A
displacement search is then run over the range [-1, 1], since we now know that
the optimal shift must lie in that range from the already-shifted image. The
results of this displacement search, added to the previous translation
calculated, are then passed up the stack to the next recursive call.
9.
This
recursive backtracking then continues until the algorithm reaches the original
image size, and it has found the optimal offsets. The image is then translated
one final time and returned, along with the displacements used, to the
alignment function which called the pyramid search function.
10.
After
both the R and G channels have been aligned to B, they are stacked with the B
channel to find the final color image.
11.
I
also implemented the automatic contrasting functionality, which (if enabled)
takes the minimum and maximum of the image and expands the range of all of the
channels in the image such that the minimum pixel is at -0.1 and the maximum
pixel is at a value of 1.0.
12.
The
image is then saved to an output file and the offsets used, along with the time
it took to align the channels, is returned.
When
I was first implementing pyramid search, I ran into quite a bit of trouble with
a number of images, which was quickly corrected by adding the cropping step.
I
also ran into some trouble aligning images with intricate patterns of differing
colors, such as those found on the robe in emir.tif.
This is when I added the demean and absolute value step, which corrected for
this by reducing the reliance of the ranking metric on the actual pixel values
of the channel.
Finally,
there were a number of artifacts in some images, which I originally interpreted
to be from my alignment procedure. However, upon closer inspection, it would
seem that these issues were caused by imperfections the source material.
Perhaps the best example of this occurring is the colorized version of the three_generations.tif
image, in which the coat of the man on the left has a haloing color effect on
the bottom left. These imperfections are a subject for further study, and would
likely be best corrected in a manual restoration.
Example
Results
emir_colorized.png
G
channel displacement: [49, 23]
R
channel displacement: [106, 41]
Alignment
time: 9.352622747421265 sec
monastery_colorized.png
G
channel displacement: [-3, 2]
R
channel displacement: [3, 2]
Alignment
time: 5.712937831878662 sec
three_generations_colorized.png
G
channel displacement: [54, 15]
R
channel displacement: [113, 11]
Alignment
time: 9.309688091278076 sec
settlers_colorized.png
G
channel displacement: [7, 0]
R
channel displacement: [14, -1]
Alignment
time: 6.227230072021484 sec
train_colorized.png
G
channel displacement: [42, 5]
R
channel displacement: [86, 31]
Alignment
time: 7.970168828964233 sec
icon_colorized.png
G
channel displacement: [40, 17]
R
channel displacement: [89, 23]
Alignment
time: 9.173699855804443 sec
nativity_colorized.png
G
channel displacement: [3, 1]
R
channel displacement: [7, 0]
Alignment
time: 5.850849151611328 sec
cathedral_colorized.png
G
channel displacement: [5, 2]
R
channel displacement: [12, 3]
Alignment
time: 4.322472095489502 sec
village_colorized.png
G
channel displacement: [65, 12]
R
channel displacement: [138, 22]
Alignment
time: 9.257261753082275 sec
self_portrait_colorized.png
G
channel displacement: [79, 30]
R
channel displacement: [177, 38]
Alignment
time: 8.883623838424683 sec
harvesters_colorized.png
G
channel displacement: [60, 17]
R
channel displacement: [125, 14]
Alignment
time: 8.090704202651978 sec
lady_colorized.png
G
channel displacement: [54, 8]
R
channel displacement: [118, 12]
Alignment
time: 8.588798999786377 sec
turkmen_colorized.png
G
channel displacement: [55, 21]
R
channel displacement: [115, 29]
Alignment
time: 7.547335863113403 sec
Additional
Results
00033u_colorized.png
G
channel displacement: [53, 10]
R
channel displacement: [103, 15]
Alignment
time: 8.395071029663086 sec
00859u_colorized.png
G
channel displacement: [56, 14]
R
channel displacement: [125, 23]
Alignment
time: 8.83419919013977 sec
00245u_colorized.png
G
channel displacement: [28, -7]
R
channel displacement: [108, -19]
Alignment
time: 9.065867900848389 sec
00344u_colorized.png
G
channel displacement: [34, 0]
R
channel displacement: [119, -2]
Alignment
time: 8.46563982963562 sec
00016u_colorized.png
G
channel displacement: [45, 12]
R
channel displacement: [96, 15]
Alignment
time: 8.455512046813965 sec
01251u_colorized.png
G
channel displacement: [51, 27]
R
channel displacement: [108, 36]
Alignment
time: 8.783711910247803 sec
Bells
& Whistles
I
implemented the automatic contrasting feature, which automatically expands the
contrast range of the image. All of the above images were produced with
automatic contrasting enabled, so below I present a few before-and-after
images.
Emir Before
Emir After
Icon Before
Icon After
Village Before
Village After