Programming Project #4-B - Atsunobu Kotani (akotani@berkeley.edu)

First, the following images are used for this programming assignment.

We focus on the camera example here. The Harris features are shown as:

And the top 500 (and 250 in the 2nd row) characteristic features are shonw as:

In this part, the maximum radii were all 9 pixels in my experiments when extracting top 500 features.

When we try to match each feature points, without thresholding the difference values, we have the following.

And by settind different threshold values for NN-1 / NN-2, we have:

From left to right, the threshold values are 0.1, 0.3, 0.5, respectively.

As we can see there are still some noise to the matching. Thus, I run RANSAC, and the result is:

Here the left image is the threshold result for 0.5, and the right one is the result after running RANSAC. We can easily see that some of the outliers have been successfully removed.

Accordingly, two transitions (Left -> Middle & Right -> Middle) are visualized below.

Finally, the resulting image is shown below. The left is the result from Part A, manually matched, and the right is this new result.

In part A, to merge the two images I simply pasted one over the other, but here I am doing a different merge method that simply takes the maximum of three images (Left, Middle, Right) for each pixel. That is why on the right image, we see small blur effects in locations that alignment did not go successfully. However, when we focus on the alignment of Left image and Middle image, the new approach is much smoother, especially the shutter button of the camera and the pattern of the table.

Further, I implemented the image pyramid approach to better select the feature matching pairs. The idea is to first select features at each level of image pyramid (with Harris + RANSAC) and then construct a larger feature candidate pool with all of the predicted features from each level of the pyramid, and finally execute the another RANSAC to select the better matching pairs.

The thresholded alignments with 0.5 is shown below.

And after RANSAC,

One simple approach in image pyramid is described as it follows. Predicted matching pairs in 1/4 scale are multiplied by 4, and the ones in 1/2 scale are multiplied by 2, before combining with the full scale predicted pixel pairs. This can be seen as the act of populating the feature candidate set (interest points) with more reasonable ones, utilizing the low-res predictions. The result is shown below.

Left is the result without image pyramid, and Right is with this SIMPLE image pyramid.

Another way of utilizing the image pyramid is to first predict the pixel alignment in a low-res image (e.g. 1/4) and then locally improve them with search in a small window as it progresses to a higher resolution image. For instance, if two points P1 and P2 are said to be a pair in the 1/4 image, we first multiply them by 2, and search the 5x5 windows for each P1*2 and P2*2 in the higher-res 1/2 image. And we repeat this process until the pairs are found in the full-res image. The result is shown below.

Left is the result without image pyramid, and Right is with this SEQUENTIAL image pyramid.

Finally, we show some other examples.

(Top Left): From Part A, (Top Right): No pyramid, (Bottom Left): SIMPLE pyramid, (Bottom Right): SEQUENTIAL pyramid

(Top Left): From Part A, (Top Right): No pyramid, (Bottom Left): SIMPLE pyramid, (Bottom Right): SEQUENTIAL pyramid

Overall, the automatic feature matching algorithm worked well, but as seen in the table example, when the most characteristic key points, such as the contour of the glass cup, are located near the edge of the images, it causes a little problem, as the current Harris feature extractor ignores the edge pixels (e.g. 20 pixels).