Project 1

Introduction

Welcome to my project 1 webpage. I will be going over my thought process and journey in the first project of the semester pertaining to Image Colorization. It was definitely not an easy project, but it did teach me a lot about image processing and how much we take for granted when we edit photos with pre-existing software.

Brainstorming

Having been really excited, I started downloading the project from last year over the summer. I started playing around with pictures and noticed that this problem was very open ended given the skeleton code provided. Something that I did immediately notice was the fact that we had to start aligning pictures across different color channels. My initial thought was to create an alignment function from scratch. This function worked necessarily, but didn't work as efficient as the Numpy functions where you use roll Having wasted a singificant amount of time, I did managed to get some kind of alignment results in which was really exciting!

Naive Implementation/Alignment

Unfortunately, the displace function that I wrote initially from scratch wasn't super successful, hence I pivoted using the np.roll implementation. Intuitively, if you are looking over a search window of x and y that goes from -15 to 15, you would start using a double for loop, which I ended up creating for my naive implementation. Something interesting to note is that the speed up is significant but the results were the same. Below I have shared some pictures. Something really interesting here is that the naive algorithm together with SSD has a lot of difficulties with putting together the latter two images when you look at the periphery. The center looks perfectly aligned, but for some reason the farther from the center you go, the more blurry it is. This, I suspect, might be due to the fact that there are more shapes in the middle and less shapes on the ends which makes it harder to maximize the similarity between the color matrices.

Pyramids

Then I started working on Pyramids. Pyramids were a really interesting experience, because most of the time I spent being confused on what was being meant by pyramids. It became clear that when we use pyramids, that we don't have to do exhaustive search over the thousands of different displacements in an already exceptionally large picture. Intiailly, I was thinking to change the search window accordingly by playing with the step size. However, soona fter I realized that a gaussian image pyramid is the most optimal in these situations. What happens in an image pyramids is that you start from the coarsest picture. I arbitrarily took a picture with 1/32 resolution of the original. Then you align it with the prior alignment methods we used. We obtain this displacement. This displacement, when we go one level down the pyramid, will be multiplied by two if the picture is the resolution of 1/16. We recusrively go all the way down until we reach the resolution of the original picture has been recovered. The most important part here is that there is a lot of debugging and a lot of thinks are happening at the same time. Some improvements that I have made specifically pertain to optimizing the time and space efficiency of this algorithm. In other words, I tried to skip all the even numbers in our displacement which reduced by two. However, I was not satisfied by the results because a displacement of 1 pixel sometimes can be very apparent. Additionally, other ways to start optimizing the time efficiency can be accomplished by gradually decreasing the window of search once you are at a lower level since that is the point of the pyramid. Basically, it is the point to sort of do a process of elimination and gradually get to the right level. The results (of the TIF pictures) are posted below. More discussion about the results and why some pictures failed and some succeeded will be shown below.

Metrics

Most of the time I used the SSD (Sum of Squared Differences). This is because it is a quick and relatively reliable metric that can be used for alignment. Other metrics that I attempted to use, but couldn't get to work were NCC. For some reason, NCC, even after I cropped the center of the pictures, it would be worse than SSD for many of the pictures, hence I decided to stick with the SDD. I also tried to play around with gradients, hwoever, the complexity with gradients is that it is a slow way of measuring the similarity between the two matrices. In theory, however, NCC should give better results for images of which the color channels are not necessarily uniform. We will go more into depth about that in discussion about the difficulties when aligning Emir's color channels.

Discussion and Observations

When we look back, we will see that many of the pictures, using relatively simple techniques have gotten a pretty good alignment. The lady, the icon, harvesters, castle and melon turned out pretty good if you look at the edges. Some limitations, however, are for emir, train and self portrait.

Emir

The challenge with Emir is that both the Red and Blue channel have very different values for the dress he is wearing. Compare these two. The reason why all the traditional measurements SSD, NCC failed is because it looks for absolute similarities in quantity. It jsut looks for an association. I attempted to normalzie the cross correlation coefficient, but what happens is that it starts disaligning the whole picture resulting in an even worse results down below. I think something that I will be working on in the future to solve Emir's problem would possible be thinking about gradients. We know that the dress he is wearing has a stark contrast with the background he has in all of the photos. That might be the right direction that I need to look into.

Another approach I attempted was maybe flipping colors, but that wouldn't necessarily fix it. Another solution that failed was trying to align the door and the cabinets behind him. The lack of colors and contours inhibited those attempts.

Train and Self-Portrait

Some difficulties with the train and the self-portrait were I think the green features on the background. All those green features on the background kind of throw off the metrics since there is a lack of red and blue. This way alignment becomes harder in a similar fashion that Emir's dress was extremely blue.