The human field of view extends beyond what we see in traditional photos. Mosaics are a way we can approach this wide field of view. Photos are taken starting from the left or right side, rotating the camera by a little for each entry in the mosiac. For the process to work, the camera must remain in the same physical position in world space. This because the aligning and stitching of the images together requires projective warping the images so they are all on the same plane. The alignment does require the distortion of some of the scene (just like a wide angle lens) but the result is a clean overlap between the shared features in the images.
The pictures were taken on a tripod. I took a series of 7 photos for each scene starting from the left, rotating to the right by about 30 degrees for each shot. I choose two photos from each scene and identified shared features between these two. These features are demarcated with correspondence points: pairs of points meant to identify the same objects in both scenes.
|
|
|
Once we have the correspondence points between the two images, we can calculate a transformation matrix that can change the correspondence points on one image to be on the same plane as the other. A homography is a 3 x 3 matrix with 8 degress of freedom, so we must provide at least 4 correspondence points to determine it with least squares.
Once we have the homography, we can apply it to the points from the image we want to warp to see what it would look like if it were on the plane of its partner image. The image will look distorted away from the correspondence points because it shifted to match up with the correspondence areas of the other image as much as possible. Identifying the placement of the pixels in the homography transformed image is done with inverse warping. Inverse warping consists of finding what values we should assign to the warped image by applying the inverse of the homography to the partner image. This gives us the approximate location of the unwarped source image that we should interpolate from to end up with our warped image.
|
|
Rectification takes advantage of the warping properties of homographies to morph an image into a specific shape/direction. The correspondence points on the other image can form a square. The end result is the image warped so the correspondence points assume a square shape.
|
|
|
|
|
|
Using the techniques outlines before, we can create mosiacs. We warp the left image onto the image plane of the right and stitch using Laplacian blending. The brightness of the overlapping portions of the image were probably due to imprecise averages between the two images.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The most interesting thing I learned from this part of the project was how much the information contained within an image can be shifted to give an entirely new perspective. In particular, rectifying part of an image to be "straight" not only straightens out the specified area, but also the entire image. On top of that, the result does not even look that warped/stretched!
Picking out the correspondence points is timestaking and prone to human error. We can automate this process so the entire process can be entirely hands-off. The end result is partially determined by randomness (RANSAC) but can definitely match up to the human-guided equivalent.
Features are elements of an image that are discernable to the human eye. The presence of an object that makes it stand out from a scene can be detected using corner detectors. The Harris corner detector inspects each pixel and its neighbors for shifts in intensity. If the pixel has a high enough "corner response" to pass a threshold, we consider it a point of interest. The threshold for the starter code given considered a bit too many posts but that will be rectified by Adaptive Non-Maximal Suppression.
|
|
|
|
Adaptive Non-Maximal Supression reduces the amount of interest points detected in an image in a way that ensures a good distribution across the entire image. For each interest point, the radius from its nearest, largest neighbor (in terms of corner response) is recorded. The interest points selected are ordered based on the radius size, descending. Having a large radius size means that the interest point has a high corner response and is relatively dominant in its immediate area. The dominance of a point in its area means that nearby points will have a low radius. The result after picking from the top radius interest points is a nice spread of interest points.
|
|
|
|
We want to create correspondences From the remaining interest points. To do this, we create 8 x 8 patches (downsampled from 40 x 40 patches) for each of the interest points. These patches are compared to each patch in the other picture to generate a map of SSD scores. The lowest SSD score is presumably the closest match to the point. But to complete the feature matching, we test whether the correspondence passes the Lowe threshold. The Lowe threshold tests the validity of the closest match by creating a threshold from the two closest nearest neighbors.
|
RANSAC stands for Random Sample Consensus. It decides a homography to compute by looking for inliers, points that match up well against its correspondance when a randomly selected homography is applied to them. The homographies we use to test for inliers are generated from 4 randomly selected correspondence points. After a number of iterations, the inlier set that was the largest contain the correspondence points we want to use to calculate the "real" homography.
|
|
The mosaics on the left were hand-selected and stitched while the ones on the right were computed automatically
|
|
|
|
|
|
The most interesting thing I learned from this part of the project was how accurate MOPS descriptor vectors can be in determining the correspondence points between two images. On top of being able to determine the similar, significant points between images, the computer-calculated placements are probably more accurate on a pixel level than hand-selection could ever be.