CS 194-26 Intro to Computer Vision and Computational Photography, Fall 2021

Project 4A: Image Warping and Mosaicing!

Name: Sarthak Arora

Part A

Part 1: Shoot the Pictures

In this part, I chose two images per scene from a total of 3 scenes to join. I made sure that the two images in each scene had around a 50% overlap. In each scene, one image was taken from a straight on angle and the other was taken from a more tilted, bent right angle. The images taken were of the vending machine in my lift lobby, my roof, and my room. These images are shown below. They were all taken using my iPhone.

Lift Lobby One
Lift Lobby One
Lift Lobby Two
Lift Lobby Two
Roof One
Roof One
Roof Two
Roof Two
Room One
Room One
Room Two
Room Two

I also collected images for the future rectification part of the project. To do this, I collected slanting images of a textbook and a tissue paper box on a notebook. I chose these as the picture was taken at an angle and the depth of these objects was clearly perceivable so correctly rectifying them would be easy to spot if we were able to straighten, and flatten the surface of the books. These images are shown below. They were all taken using my iPhone.

Textbook
Textbook
Tissue Paper on Notebook
Tissue Paper on Notebook

Part 2: Recover Homographies

In this Part, I knew that since we had 8 degrees of freedom I only needed 4 corresponding points from each image to define the transformation matrix. However, as we saw, overdetermined systems (> 4 points) work better when trying to recover the homography matrix. I used 8 corresponding points per image for all 3 scenes of mine when trying to recover the matrix. Since the system was overdetermined, I used least squares to solve for the matrix. The least squares construction was done in such a way that H*p is as close as possible to p' where p is the points in the first image and p' are the corresponding points in the second image. We set this up for 8 parameters in the 3*3 matrix and the bottom right entry was 1. Below are the corresponding points chosen for one pair of images. For the remaining scenes, you can reference the notebook to see the points chosen.

Points on Ground 1
Points on Ground 1
Points on Ground 2
Points on Ground 2

Part 3: Warp the Images

In this part, I created a warp function through using the polygon and matmul function. I did an inverse warp so I had to take the inverse of our H matrix. This was multiplied by a vector whose entries were determined by the polygon created using the transformed corners of our original image. This was found by H * original corners. The polygon gave me all the coordinates that I needed to inverse warp as it was contained within the transformed corners. I warped the 'side on' (second) image of all my three scenes. The results are shown below.

Ground 2 Warped
Ground 2 Warped
Roof 2 Warped
Roof 2 Warped
Room 2 Warped
Room 2 Warped

Part 4: Image Rectification

In this part, I used the previous parts of recovering a homography and image warping to rectify an image. More specifically, I found the homography between the 4 corners of an image and the 4 corners of my textbook. I then inverse warped the textbook using the homography that I found so that the resulting image was rectified and plane aligned. The results turned out well and can be seen below. Both images are squashed, straightened and plane aligned. The original images can be found in part 1.

Textbook Rectified
Textbook Rectified
Notebook Rectified
Notebook Rectified

Part 5: Blend the images into a mosaic

In this part, the methodology I used was to blend the main straight on image with the 'side' image warped onto the main image. More specifically, to do this blending, I created a mask for both images where the pixel value was not 0 and then got a combined mask by taking the logical and for these two masks. Keep in mind, the size of the mask was the same size as the warped image. I then overalid both my images onto my result canvas making them overlap and used weighted averaging to effectively blend them together. The results for each of my three scenes are shown below. As we can see, the objects such as the box on top of the vending machine align well enough however there seems to be a blending issue with the colours. Perhaps this is because of slightly different lightings and camera settings during image capturing which can be taken care of with a more professional camera. Otherwise, the images turned out decently well.

Ground Blended
Ground Blended
Roof Blended
Roof Blended
Room Blended
Room Blended

Conclusion Part A

I learnt a lot about how Homographies can be found and how extremely dependent and sensitive they are to defining good correspondences and not having any mismatch/outliers. The process was definitely time consuming and annoying which can hopefully be solved using automatic correspondence alignment. Furthermore, I once again learnt about the power of linear algebra in solving for transformations and in warping.

Part B

Part 1: Harris Interest Point Detector

Harris Point Detection finds corners such that when you zoom into it and then move in any other direction, there is a shift in intensity. Through linear algebra, image derivatives, the ratio of the determinant and the trace, we get our harris points. The Harris Points are shown below for one pair of images. There are tons of points, which makes sense as the vending machine and items inside have many corners.

Harris Points on Ground 1
Harris Points on Ground 1
Harris Points on Ground 2
Harris Points on Ground 2

Part 2: Adaptive Non-Maximal Suppression

We only need a few points and correspondences. Therefore, we will try to reduce the density of the points of interest. We want to choose fewer (eg 1000) points such that they are spaced out and have high 'interest' as described by the paper's ANMS section. The points with the 1000 largest r-scores are chosen. r-score incorporates both r2 distance and 'interest'.

ANMS Harris Points on Ground 1
ANMS Harris Points on Ground 1
ANMS Harris Points on Ground 2
ANMS Harris Points on Ground 2

Part 3: Feature Descriptor Extraction

The next step is to match correspondences. The first step in this is getting a summary of each of the 1000 corners for both images such that they are invariant to noise, intensity shifts, rotations, scale, and homographic projections. To do this, we take a 40*40 window around each point and convert it to an 8*8 patch and then normalise this and make sure it is invariant to the above mentioned stuff.

Part 4: Feature Matching

We brute search for matching features while also taking care of outlier detection. T To make sure that our NN algorithm matches correctly, we only choose a feature if the ratio between the first and second closest neighbors is less than 0.3. The selected features for one pair of images is shown below. We see the algorithm does a good job as a lot of points on the sign above the vending machine are chosen in both images. However there is some confusion - for example the fire alarm is matched to the lift floor indicator as they both look similar and have similar surroundings.

Matching Points on Ground 1
Matching Points on Ground 1
Matching Points on Ground 2
Matching Points on Ground 2

Part 5: 4-point RANSAC

To compute our homography, we use the idea of 'consensus'. We sample a random number of points, see which one of the samples gains the majority vote - rejecting outliers. We can also threshold on a homography's error having to be less than a certain threshold. This way we get robust homographies using the idea of minimizing error through randomness and consensus.

Part 6: Producing Mosaic's and Comparing Results

With our homography, we could go ahead as normal to warp the side image and then blend the two together. Here are the results for our manual and auto stitching on 3 examples. We see in 2 cases the auto case falls short. This is because a lot of bad correspondences are chosen since there are lots of similar objects in our pictures in completely different places which confuses the algorithm and chooses bad pictures. This leads to bad homographies and thus bad stitching. The good features are too few in number and too close together.

Ground Autostitched
Ground Autostitched"
Ground Manual Blended
Ground Manual Blended
Roof Autostitched
Roof Autostitched"
Roof Manual Blended
Roof Manual Blended
Room Autostitched
Room Autostitched"
Room Manual Blended
Room Manual Blended

Conclusion Part B

I was amazed by how cool it was that corresponding points could be found automatically using a bit of linear algebra and intuition. The algorithm was definitely really cool and I got invested in trying to optimize my hyperparameters and making sure I was getting the best features possible. I also learnt a lot about making these automatic methods robust and I definitely think I have come a long way since part a.