Danji Liu

Overview of this project

I enjoyed writing the code for this project a lot. At first it seems complicated how we're going to align the channels. As I refer back to lectures, I realize that each pixel is just a number and therefore we just have to manipulate these numbers. Then it feels much better to get my hands on the project.

Challenges

The first challenge was to understand the dimensions of the image array. I didn't know it was a 2D matrix so I spent a lot of time wondering why a single np.sum() will error out

The second challenge is to crop the image. I wasn't careful enough about reading the project spec and I missed the part where we should crop the image. I kept getting a weird fuzzy output no matter what I tried. Later on I realized it's because the borders are very noisy and impacted the result.

Later on, I struggled with implementing the pyramid. I discovered that I could use a built-in function to shrink the image. That saved my life. Meanwhile, I didn't know when to stop the recursion. I went to Office Hours and got some help on that. Finally, I had a hard time figuring out what to do with the offsets from the last recursive call. I realized that we needed to multiply them by 2 and that was bugging me for a long time.

Overall, it was a fun project. I did have stressful times where I didn't know what to do. I'm happy that I re-read the project spec or went to office hours to seek for help.


Naive Low-Res Algorithm

I split the tricolor plane into the components and then aligned red and green to the blue channel. I cropped out the channels so that only the middle 40% is used to compute offsets. I used a simple for loop to search for the best offset within a 15x15 window. I used the SSD method where L2 Loss is the metric. The best offset minimizes L2 loss. As I mentioned before, I didn't crop the image originally so the outputs were messy

High-Res Algorithm with Pyramid Optimization

In order to run the program on larger images whose displacement can be very big, I implemented a pyramid speedup optimization. I called the single-layer ssd function recursively (usually 5 times for big images and 2 times for smaller ones). On each level, the image is rescaled by half and I'd look for the best offset over a 15x15 window with the lowest SSD. On the next level (with an image twice as big as the previous level), I'd scale the estimates by two and search over a 15x15 window over the new image. Repeat the process when we go back to the image of the original size. Return the final estimate.


My Output

These are the outputs of the required images

The offsets are in the order of X, Y. (horizontal, vertical).

Castle

castle

Channel G: 2, 34

Channel R: 5, 98

Cathedral

cathedral

Channel G: 2, 5

Channel R: 3, 12

Emir

emir

Channel G: 24, 48

Channel R: -197, 235

This one didn't work quite well because the exposures of the three channels are different. The whitest white and the darkest black are not consistent across the channels

Harvesters

Channel G: 17, 59

Channel R: 14, 123

Icon

Channel G: 18, 41

Channel R: 23, 90

Lady

Channel G: 8, 51

Channel R: 11, 111

melons

Channel G: 9, 81

Channel R: 12, 179

monastery

Channel G: 2, -3

Channel R: 2, 3

onion_church

Channel G: 27, 50

Channel R: 37, 108

self_portrait.jpg

Channel G: 29, 78

Channel R: 37, 175

three_generations

Channel G: 14, 50

Channel R: 12, 110

tobolsk.jpg

Channel G: 3, 3

Channel R: 3, 7

train.jpg

Channel G: 6, 42

Channel R: 32, 85

workshop.jpg

Channel G: -1, 53

Channel R: -12, 105

Images of my own choice

The three images I picked were all big tif images before alignment. (around 30MB) To save space, the outputs are all jpg.

flowers.jpg

Channel G: -5, 49

Channel R: -23, 96

lake.jpg

Channel G: 10, -22

Channel R: 12, -33

vase.jpg

Channel G: -2, 24

Channel R: -2, 113

What didn't work well

Emir.jpg didn't work well because the channels have different brightness. The displacement SSD is really skewed.