CS 194-26 Fall 2017

Project 6

Sheng-Yu Wang (ade)



Part A: Image Warping and Mosaicing



Photos

Photos are taken by phone. For mosaics, five photos with different perpectives were taken, but the camera stayed at the same position so that the mosaic can be worked out in simple homograhpies.



Recovering homographies

Instead of complicated physical calculation, since we know the images are approximately under projective transformation (camera position unchanged). We only have to find th unknown projective transformation with 8 dof. This can be done by choosing corresponding features points (at least 4 to solve the eqn) between images and solving by least squares. The more informative calculation is as follows. Note the equation above is the projective transformation, and the bottom equation is the least square formulation.

Image rectification

To rectify an image, a specific planar object is warped into a frontal-parallel plane. For the first example, a window is rectified, and in the second example the stone floor is rectified. Note that since the projective transforms creates a huge change to the perspective, the scenes outside of the selected object will be distorted badly, so the cropped version is provided. We can see that the selected objects were pretty well transformed into a frontal-parallel plane.

Original
Rectified
Cropped
Original
Rectified
Cropped


Mosaics

To generate a mosaic, all the images are projective transformed to the same perspective of the third image then warped. The transformation recovery method is identical to image rectifications. To warp images, a background is first generated by finding the boundaries of each projective transforms, and then each images are warped onto the background. Note to prevent artifacts, linear blending is used for each images, where the middle third of the overlap between images is linearly blended in horizontal directions. Masks were used to make sure any region that was not overlapped was not taken into the blending process, which prevented ghosting at the overlap boundaries. Below are the results. Although there are some artifacts if looked closely, but in general it worked pretty good.

Things learned

It turns out to be more complicated to do linear blending with irregular boundaries than I originally thought, and it took me for a while to figure it out. It definitely feels great to get the final thing look good, and it is impressive that given the constrait to fix the cameras position, we can make such intersting mosaics with simple linear transformations. Also, the image rectifications looked pretty cool. For the window one, it was pretty impressive to actually see stairs at the right of the rectified image.



Part B: Feature Mapping for Autostitching



Automatic Feature Detection -- Harris Corners

Corners are definitely more obvious features in a picture, so being able to detect corners will be a very useful first step for autostitching. Harris corner uses the fact that corners have a unique characteristic, where the patch intenstiy of corners shifted a lot while moving in any direction. On the other hand, patch intensities of edges will not change significantly. Using this idea, corners can be mathematically detected, and the results are as follow. Note that the boundary of images are set to have no points.



Adaptive Non-Maximal Suppression

Note that in order to achieve autostitching, we must find the corresponding feature between 2 images, and having too many feature point will cause a huge runtime issue. Therefore, we need to find a way to suppress the number of feature points, while keeping both strong corners and an equal spatial distribution. In order to achieve this, adaptive non-maximal suppression(ANMS) is implemented. The idea of ANMS is basically choosing certain number (500 in this project) of strong corners, where their corner strengths are significantly stronger than their neighbors. As long as this is achieved, it is reasonable to expect a more evenly distributed strong corners as follows.

Feature Descriptors and Feature Matching

Feature descriptors are a 40x40 patch whose center is the corners found in previous sections. To prevent aliasing, the patch is passed to a higher level of guassian pyramid; in other words, it's resized to an 8x8 patch. In this project, we assume that all corresponding features in our photos look "similar", which means they are approximately invariant of transfomations such as rotations. Therefore, without performing transformation onto the patch, a straight SSD metric is applied to every possible pair of features. Empirically, a modified nearest neighbor approach is known to work best for accurately find correct correspondance. That is, instead of directly comparing the magnitude of SSD metric, the ratio between the approximity (under SSD metric) of the nearest and second nearest neighbor is compared. It turns out that when the 1-NN (1st nearest neighbor) is much closer than 2-NN, it is very likely that the feature is paired up with the 1-NN. Using this idea, a pretty good pairing is generated as follows.

Not yet whitened 8x8 descriptor

RANSAC

Recall that LS is performed to retrieve homographies. One downside of LS is that LS is very sensitive to outliers, so even if the feature matching is doing a relatively good job, it's necessary to have an algorithm that excludes most, or all, of outliers. RANSAC, in fact, turns out to be a very good one to exclude possible outliers. To retrieve homography, (4-)RANSAC algorithm basically takes 4 points at random, and perform homographies with the 4 points to see how many points agree with the transformation. The step is repeated for a given amount of time, and the best tranfomation should have most points agree with it. Last but not least, the transformation will be recalculated using all the inliers found during the RANSAC algorithm. Below are the results, and we can see that some points from the feature matching process are excluded to obtain a more accurate homography.



Auto-generated Mosaics

Mosaics are basically created in the exact same way as previous ones, but with feature points automatically selected. Note that both methods looks pretty good but still have several artifacts if looked closely. There are several issues that leads to difficulties in alignment. First of all, the center of projection is not perfectly still, which causes the homography to slightly break down. Also, lens distortions may also play a role in alignment difficulties.

The coolest thing learned

Definitely Harris corner detector. It's really exciting to see clever applications of derivatives leads to such a powerful algorithm. Using eigenvalues to determine the corner strength invariant of angles and directions is also pretty cool.



Bells & Whistles -- Moving Mosaics



It's definitely not a brilliant idea, but I'd like to try out what will happen if we do some interpolation to create a moving mosaic, and below are some results. Note that since linear interpolation doesn't not taken into account of a constant focal length (a cylindricl projection will take that into account), so it definitely looks a little weird. However, if we crop the image to a narrower size, the gif image looks more like a scene where people's looking around, and that's because a narrower FOV can be better approximated to have similar depths and focal lengths as COP is turning.