Project 5: (Auto)Stitching Photo Mosaics

1. Image Warping and Mosaicing

Recover Homographies

The data for warping is consist of five images and 10 pairs of keypoints are annotated by hand for each pair of neighboring images. The red dots in the images below are the annotated keypoints used to warp the images.

Image Rectification

Below are some rectification result using the perspective transformation algorithm. The red dots in the images are the keypoints used to warp them.

Mosaic

We can create a panorama photo by stitching multiple images together. There is a slight parallax effect due to having translations when taking the photos.

2. Feature Matching and Auto-stitching

Keypoint Detection

The first part is to use Harris Corners as the candidates for matching, and then use adaptive non-maximal suppression to subset the points so that they rather spatially distribute the image. However, the skeleton code finds way too many keypoints, including many low quality ones. This causes the later suppression part to run ~10x slower and yielding many unwanted points. To combat this, we increase min_distance in peak_local_max to 4 and only take the top 25% of the returned keypoints. Below is the comparision between with and without this suppression.

without thresholding with thresholding
Harris corners
histogram of corner strength
after ANMS

As we can see from the histogram, most keypoints returned by the algorithm are of low qualities, and the result after suppression also contains many cloud/sky points, which were moving and not ideal for matching.

Extract Feature Descriptors

We sample a 40x40 area in the original image and then down-sample and normalize the features into 8x8 patches.

sample areas descriptor patches

Matching Algorithm

To match the descriptors, we first use Lowes' Trick by examining the ratio between the distance to the best match and the distance to the second best match. Intuitively, if a match is clear and correct, this ratio should be very low. We threshold the keypoints that the remainig points all have a value less than 0.4.

h
The black points in the images are rejected keypoints, and the colored keypoints with the same color are a match. Althoug we lost many correct matching in this step, the remaining matches are almost all correct for this pair of images.

We then use random sample consensus to fine tune the pairs. In each iteration, we sample a random 4 paris of keypoints, compute the corresponding transformation, and test the number of other keypoints that can be matched using such transformation. We keep the transformation with the max number of correct matching keypoint pairs, and fine tune the transformation by apply least square on the correct keypoint subset.

Result

The transformation found is correct. However, the edge of the image can be seen due to the slight exposure difference between the shots.

Bells and Whistles: Stitching Game Screenshots with Keypoint Masks

We also tried the algorithm on game screenshots. The challenge is that sometimes the game UI cannot be turned off, and such constant features with a very high corner strength usually causes the algorithm to output identity for the transformation. We use a manual mask to ignore all keypoint in some regions to solve this issue.

Note that there are no keypoints in the black regions highlighed by the mask. This allow us to stitching the screenshot together without manual fine tuning.

Bells and Whistles: 2-band Blending

To solve the edge issue mentioned above, we use a 2-band blending method similar to the previous project. We apply a linear gradient for the low frequencies and does not blend the high frequencies. The challenge for this part is to figure out a good alpha mask for the overlapping region. Before apply the transformation, we also record the normalized distance to the nearest corner for all sampled pixels as shown below:

For the non-overlapping region, we sample the original image normally, and record the distance. For the overlapping region, we compute dist_img0 / (dist_img0 + dist_img1), and use such value as the alpha mask. if both distance are zero, the result is set to 0.5. This guaranteed that the mask is smooth, and all edges that touch one of the images is either zero or one.

For frequency separation, we use a simple 7x7 Gaussian kernel as our low-pass filter. We also tried separate in the frequency domain, but the result does not looks as visually pleasing compared to the Gaussian approach. Below is the comparison between no-blending, naive alpha blending and the 2-band blending results.

Method
no blending
naive alpha blending
2-band blending

Note that the top-left notification and the lower UI are less awkward in the 2-band blending result compared to the naive alpha blending.

Final Results

Note that some panoramas are not stitched perfectly. I believe it is due to the fact that the input images are not perfectly stitchable (trees moving, etc.). Since I implemented 2-band blending, this causes noticeable ghosting effect compared to just using a binary no-blend method.

Additionally, the blending algorithm assumes that at most two images are overlapping, but it's not the case for some panoramas. This causes later (rightmost) images to have priorities as can be seen from the guns, UIs, etc. My solution to this problem requires keeping an canvas for each images, which often exceeded memory limit and thus is not very practical.