Project 5

Danji Liu

Shooting the images

The images I shot for stitching

Marking key points

I used ginput to get the key points of the image. I extracted 6 key points to avoid noises.


I used np.linalg.lstsq to calculate the points. First, I created a coefficient matrix where each row corresponds to predicting a single x or y value. Each column corresponds to one number in the transformation matrix. The coefficient matrix has 8 columns and 6 rows (since I used 6 points to avoid noises). Then, I used least square to compute the transformation matrix by minimizing the least squares. The transformation matrix is calculated as follows.

array([[ 1.95807923e+00, -7.89919677e-03, -3.31251896e+02],
       [ 3.68086626e-01,  1.61761135e+00, -1.04129632e+02],
       [ 2.29387507e-03, -1.19801278e-04,  1.00000000e+00]])


I struggled a lot with warpImage at first as I didn't know how to map the pixels. I used inverse warping, getting the pixel values from the original image. As Apollo pointed out on Piazza, I decided to use cv2.remap() to calculate the interpolation. The way to construct the input for remap is to first calculate the inverse of the homography matrix. I struggled with this a lot as I didn't realize that, in image representation, the first dimension is height and the second dimension is width. After switching the first and second row of the coordinate matrix, the results work out just fine. Once the matrix is complete, we get a shape where each column is [x, y, 1].T. By multiplying this matrix and the homomgrahy transformation matrix, we get a new matrix that represents where the pixels reside in the source image. Then, we call the remap function on the source image as well as the x, y positions. This will take care of the lookup of these pixels using interpolation.


I tested ImageWarp by rectifying the following image. After getting the four corners of my laptop using ginput, I computed the projective matrix from source to destination. Later, I passed in the homography matrix and the image to warpImag , which gives me the following results.

Warp the image

Here's the result of warping the image to match the front. It's important to extend the range of the warped image. Otherwise, the image will be cut off as it maintains the same dimension.


I experimented with many ways to blend the two images. Because the edges are not in a uniform shape, I gave up the idea of creating a mask. Meanwhile, i tried averaging the pixels but the results are bad. I wrote a for loop to ignore the dark background when averaging the two pixel values on the same x,y position. The result shows that there's still a black line running across the picture, which apparently is the edge of image 2.

At the end, I found that taking the maximum between two pixel values produced the best results. See below.

What I learned

The best way to stitch each image is taking the maximum between two pixels

When computing the homography matrix, it's important to divide the x and y value with w (the third value). The calculated w for each vector [wx, wy, w].T is all different. We can get the true x, y values by scaling the vector to ensure w = 1.

The most tricky part for me is to swap x and y in the homography matrix. When an image is stored in ipython, the first dimension is height while the second is width. The matrix, on the other hand, assumes otherwise. My resulting images were rotated 90 degrees without swapping the rows in H.

Project 5b

Harris Points

After running the sample code, I get more than 1000 Harris points for my image. According to Piazza, I changed peak_local_max to peak_corners and set the parameter min_distance at 20. The result, illustrated below, outputs an ideal amount of points. However, this gives me not enough points for future computation.

I went back to the original implementation of harris points function where I used peak_local_max. The difference is that I set the min_distance at 3, which gives me an ideal amount of points to filter in the future.

Suppress the points

First, I called dist2 to calculate the radius between any pair of points.

Second, for every interest point, find minimal distance of the points that has a higher corner score. As the paper showcases, we have to use a coefficient of 0.9 to enforce a stricter screening process.

Once I find the minimal distance for all the points, I ranked the points based on their radius, from high to low. The goal of this section is to find a certain amount of interest points that are not too close from each other. I set the expected number at 150.

Finally, I took the first 150 points of the sorted points. This will give me a list of points that suppress their neighbors whose corner scores are lower.

Extract Feature Descriptors

For creating features that would characterize each interest point, I first put a 40x40 grid around each interest point. To make sure that the patches characterize all of the pixels within the grid, I applied a Gaussian filter (kernel size = 2) on the image. Then for every 5x5 square within the 40x40 grid, I picked the pixel value at the center of the square. This gives me an 8x8 feature map. I flatten the feature map and then normalize the value so that the mean is 0 and the std is 1. Normalization reduces the differences in exposure and contrast, making it easier to find matching interest points across images.

Reject Outliers and Match the Points

For every interest point in the source image, I calculate the ssd of that point against all the other points in the destination image. I then find the first and the second lowest ssd scores. If the ratio first_best / second_best is higher than a certain threshold, I reject the interest point. A high ratio means that the point look similar to many other points in the destination image, which is unlikely a good candidate for matching.

After experimenting with the threshold values, I set the threshold to 0.2. A threshold too high will give me too many irrelevant points that are not in the overlapping area. If the threshold is too low there won't be enough points. Setting it at 0.2 gives me 7 points and they're all good matches.


Then I ran the RANSAC algorithm. I picked 4 points at random, computed the homography matrix, and then applied the transformation to the rest of the interest points. I set the error at 1. I counted those points where the prediction was less than 1 pixel away from the real coordinate. After 10 iterations, I used the homography matrix that gives me the most counted points.

I noticed that different from part A, the points outputted from the harris function are (width, height). The imagewarp function I wrote for part A assumed that points are under the format of (height, width) when taking in the homography matrix. In order to fix this, I changed the homography matrix in part B so it's compatible with points that follow the format of (height, width). Then, I sent the modified matrix to the same imagewarp function. The results are shown down below.


For blending the image, I picked the maximum pixel values over the overlapping area. However, this doesn't give me a good result. When I took the two pictures, the exposure is slightly different as the camera is adjusted to the different lighting conditions. This leaves a sharp edge on the left of the second image.

To fix the blending issue, I overlaid the two images with two different weights. As the weight of the first image goes from 0 to 1 (from left to right), the first image fades away while the second image becomes more notable. This not only removes the sharp edge, it also ensures smooth transition (especially the sky) between the first and the second image. The result looks pretty good.

Other Mosaics

I run the same code on the two images in part A. The goal is to see the differences between automatic stitching and manual stitching.

I set the threshold value at 0.15 and the algorithm gives me 6 points.

I also applied smoothing to the mosaic. The blending isn't fully successful as we can see a sharp edge under the top door frame.

Here's the result from part A where I manually find the points for this image. Notice that I downsized the image a lot more because calling ginput on a large image would crash my computer. The automated result looks similar to the manually-stitched image.

left: manual stitching; right: automatic stitching

Another Example

I set the outlier threshold at 0.3

It gives me 5 points

The final result (after smoothing as I did with other examples)

What I learned

For part b, I learned that the threshold for rejecting interest points is different for each image. For the three pairs of images, I used 0.2, 0.15, and 0.3 respectively. It depends on the contrast and sharpness of the image.

I also realized that taking the max over two images doesn't always give me the best result. It only worked in part A because the lighting conditions in both images are similar.

Again, the tricky part for image processing is to understand that coordinates and image pixels are represented in different ways. Some of the coordinates (for example, the ones from the harris function) are (height, width) while in image representation the pixels are (width, height).