CS194-26 Project 1
Vincent Zhu 3032108340
Overview:
In the early 20th century, Sergei Mikhailovich Prokudin-Gorskii took thousands of color photographs of the Russian empire, in the form of RGB glass-plate negatives. These photographs have recently been digitized and published online. The purpose of this project is to take the three color channel images and properly align them to produce a full color photograph.
My Approach: I assumed a simple [x,y] translation was sufficient to align the images properly. I exhaustively searched over a [-15, 15] pixel displacement window and chose the displacement that yielded the best score under the sum of squared differences similarity metric. For photos that were too large to do this on, I used an image pyramid approach, starting from the coarsest scale and moving downwards, updating the best displacement estimate at every level.
There were no problems that required me to change my implementation design; the only problems I encountered were in implementing said design, which was probably a good thing. For example, a very persistent bug I spent a long time finding was in line 71 of main.py. At one point, I wasn't summing the two displacement values; I was just returning one of them. This meant my estimate was not updating properly.
Calculated Offsets:
Cathedral: G[5, 2] R[12, 3]
Emir: G[48, 24] R[-55, 16]
Harvesters: G[59, 18] R[124, 15]
Icon: G[41, 18] R[90, 23]
Melons: G[84, 10] R[180, 14]
Monastery: G[-3, 2] R[3, 2]
Onion Church: G[50, 26] R[108, 37]
Lady: G[58, 5] R[118, 10]
Tobolsk: G[3, 3] R[7, 3]
Self Portrait: G[77, 29] R[175, 37]
Three Generations: G[54, 16] R[106, 14]
Train: G[42, 7] R[85, 32]
Workshop: G[160, 133] R[288, 60]
Calculated Offsets:
Guardhouse: G[10, 18] R[27, 28]
Por Porog Waterfall: G[27, 0] R[106, 0]
Lake Sterzh: G[64, 24] R[77, 30]
Casting: G[61, 23] R[142, 21]
Of all the ones I tried, Emir and Village aligned very poorly, and Lake Sterzh aligned passably but noticeably worse than the others. Village and Lake Sterzh both had long expanses of similar pixel values that made it hard to find the correct displacement values, while Emir had a lot of noise in its data. Implementing the bells and whistles would probably have helped significantly with these images, especially cropping the borders and trying slight rotations.