Project 4

Part A

Image Warping and Mosaics

For Part A, I took photograph of places that I thought would be interesting to stitch into a mosaic, which is an approximation of a photograph taken by a single large field-of-view camera. Moreover, panoramic mosaics can span more than 180 degrees. I chose three sets of images in indoor and outdoor settings, picking locations that have many specific details and horizontal and vertical lines to help with picking correspondence points.

Part 0. Shoot the Pictures

The first set of images was taken at Dwinelle Hall at UC Berkeley, where CS 294-26 is currently taking place.

Dwinelle Hall Auditorium: Side View

Click to enlarge.

Dwinelle Hall Auditorium: Straight-On View

Click to enlarge.

Part 1. Recover Homographies

For part 1, I chose two images of the Dwinelle Hall auditorium to compute the homography matrix. I chose these images because they had many straight lines (horizontal and vertical) which made it easier to test my homography computation. The original images are shown in images above. These photos have a significant overlap in the camera angle and were taken a few seconds apart with an iPhone camera.

The images were rescaled to be 1008 x 756 pixels each before computing the homography to speed the computation.

Eight (8) correspondence points were chosen in each image. I treated the straight-on view image as the source and warped the side view image towards the straight-on camera angle. The correspondence points chosen are shown in the images below.

Correspondence Points: Side View

Click to enlarge.

Correspondence Points: Straight-On View

Click to enlarge.

After choosing the 8 correspondence points in each image, I computed the Homography matrix to transform the coordinates from the source image coordinates (p) to the target image (p'), using the equation: p' = H * p

From 2 sets of 8 correspondence points, we have 16 equations to find 8 unknowns. The solution to H is not exact. The solution can be estimated by writing the 8 coordinate points as a 16 x 8 matrix, where each two rows is the transformation matrix for each pair of correspondence points. The general equation form is shown in the image below.

Homography Equation

Image credit: CS 294-26 recitation on 10/06/2022

Click to enlarge.

The eight unknowns are the first eight elements of the matrix H. The last element is known and is normalized to 1. The elements of the H matrix are estimated using least-squares method. For the equation of the form Ah = b, the solution is the best h that minimizes the error for square of || Ah - b ||.

Part 2. Warp the Images

After I computed the matrix H to warp the source image to the target image, I warped the target image to the source image, using the inverse of the computed homography matrix H.

Before doing the warp, I computed where the new bounding box of the warped image would be. I warped each of the four corners by applying the forward warp p' = H * p to the four corner point in the input image. Then, I computed the minimum and maximum coordinates of the new warped image. Some warped coordinates were outside the positive image range (i.e. had negative non-integer coordinates). In order to compute the color values of the warped image coordinates, I created a regularly spaced integer grid that spanned every integer pixel inside the bounding rectangle from the new top-left corner: (min x, min y), to the new bottom-right corner: (max x, max y), where min x, min y, max x and max y were the new warped image corner coordinates. Every pixel inside this bounding box came from the input image, so applying the inverse homography warp to the integer pixels inside this bounding box produced either a pixel coordinate that's inside the input image bounds or the pixel that's invalid. The invalid pixels that didn't have a corresponding value in the input image were mapped to white.

Finally, interpolating the color of each pixel in the shifted warped image, I used Python scipy.interpolate.griddata method. For each color channel, I interpolated the new image integer coordinates, given the input image color values, plus the input image pixel coordinates. An example warping of the target image above (left) to the source image above (right) using homography from part 1, produced the warped image, shown below. Warping and interpolating the original image above (1008 x 756 pixels) into the warped image (1005 x 1207 pixels) took ~37 seconds.

Dwinelle Hall Auditorium: Side-View Warped to Straight-On View

Click to enlarge.

Part 3. Rectify the Images

In this part, I tested the computations in part 1 and part 2 on a single image that had a planar surface with a regular square grid (tiles on the kitchen counter).

The original image with the tile was taken at a significant angle with respect to the planar surface of the tile. I rectified this image by "rotating" the camera as if the photo was taking with the camera parallel to the tiles. I assumed that the tile in the third row in the middle of the counter is a square with the side equal to the furthest side parallel to the image shorter edge (the side between correspondence points 1 and 2). The original image with labeled correspondence points is shown in image below on the left. The original image with the second ("rectified") correspondence points is shown in the image below on the right.

Square Tiles with Correspondence Points (Angled Camera View)

Click to enlarge.

Square Tiles with "Rectified" Correspondence Points

Click to enlarge.

The rectified image is shown in the image below.

Square Tiles: Rectified

Click to enlarge.

The second image to rectify I picked was a chess board because of the regular square pattern, which made it easy to pick correspondence points. The original image is shown on the left. The rectified image is shown on the right.

Chess Board: Angled View

Click to enlarge.

Chess Board: Rectified

Click to enlarge.

Part 4. Blend the Images into a Mosaic

Blending the images into a single mosaic involved aligning the source image with the warped image by using the correspondence points. I kept the source image as it was, and warped the target image towards its camera angle. To find the new warped correspondence points, I applied the inverse Homography H_inv matrix to the set of target correspondence points from the un-warped target image in part 2. Then, I added a shift in the x and y direction by the amount the warped image increased, compared to the starting un-warped image. Finally, because the warped image was larger than the source image, I resized the warped image to match the scale of the source image by using the correspondence points. I found the scale of the warped image by computing the ratio of the pairs of correspondence points, and finding a pair of points that gave a good alignment in most of the image. Because correspondence points were chosen manually and the homography is not exact, there are a few artifacts in the mosaic.

After the sizes of the warped and source images were the same, I stitched the two images by first creating a canvas large enough to hold the final mosaic. For blending, I calculated the overlapping region by using the corners of the un-warped image, and applied alpha blending to the overlapping region by using a gradient mask. The mask image is shown in the image below.

Gradient Mask

Click to enlarge.

The mosaic image of the Dwinelle Hall auditorium is shown in the image below.

Dwinelle Hall Auditorium Mosaic

Click to enlarge.

Additional examples of mosaics are shown below, along with input images.

Left-view of house interior

Click to enlarge.

Straight-on view of house interior

Click to enlarge.

Mosaic view of house interior

Click to enlarge.

First view of the Rose Garden

Click to enlarge.

Second view of the Rose Garden

Click to enlarge.

Rose Garden Mosaic

Click to enlarge.

Lessons Learned

In this project, the coolest thing I learned is that a few points is enough to change the view of a photo into a different view. Manual alignment worked pretty well, but was a bit tricky to get right, and not all parts of the image mosaics were aligned perfectly. Learning how to deal with negative coordinates was an interesting technical challenge.

Bells and Whistles

I didn't implement any Bells and Whistles for Part A.

Part B

Feature Matching for Auto-Stitching

For Part B, I implemented auto-stitching of image mosaics, based on the paper by M. Brown et al., "Multi-Image Matching using Multi-Scale Oriented Patches" (2005). I used the same photos I used in Part A to test the automatic feature detection.

Part 1. Detecting Corner Features

First, I took the given Harris corner detection code and adjusted it slightly. Without changing any parameters the code was returning ~17k points. I adjusted the separation distance metric down to reduce the number of Harris corners to ~4k.

Dwinelle Hall Auditorium: Side View

Click to enlarge.

Dwinelle Hall Auditorium: Straight-On View

Click to enlarge.

Dwinelle Auditorium Side View: Harris Corners

Click to enlarge.

Dwinelle Auditorium Straight-On View: Harris Corners

Click to enlarge.

Second, I implemented adaptive non-maximal suppression of Harris corners to extract only the local maxima and select 500 interest points that are well-distributed over the image area.

Dwinelle Auditorium Side View: Adaptive Non-Maximal Suppression of Corners

Click to enlarge.

Dwinelle Auditorium Straight-On View: Adaptive Non-Maximal Suppression of Corners

Click to enlarge.

Part 2. Extracting Feature Descriptors

In this part, I extracted a feature descriptor for each point of interest (top 500), computed in the previous part. For each point of interest, I created a 40x40 pixels region of interest (cutting them off at the image borders), then down-sampled each region to be an 8x8 patch. Then, I normalized each descriptor patch to increase invariance. Therefore, the transformation from one image to another can be treated as affine based on the obtained descriptors.

Example Normalized Feature Descriptor Patch

Click to enlarge.

Part 3. Matching Feature Descriptors

In this part, I implemented feature matching algorithm, based on nearest-neighbor metric described in the paper by Brown et al. I used the suggested metric of the ratio of the first-nearest neighbor to the second-nearest neighbor as a way of determining a threshold for which patches/points to keep and which to discard. The metric is based on the sum of squared differences of the patch in image one, compared to all the patches in image two. I chose the cutoff threshold to be 0.25. This reduced the number of interest points in each image from 500 to ~20.

Dwinelle Auditorium Side View: Matching Feature Descriptors

Click to enlarge.

Dwinelle Auditorium Straight-On View: Matching Feature Descriptors

Click to enlarge.

Part 4. RANSAC

In this part, I first implemented the Random Sample Consensus (RANSAC) algorithm to estimate homography based on a subset of 4 points, chosen from the points of interest computed in the previous step. If the computed homography produced the most number of inliers that agreed with corresponding target point coordinates, that's the best homography and those points were kept. I repeated the same process 2000 times with 4 interest points randomly chosen from the source image. The RANSAC algorithm narrowed down a list of points to 9 in the Dwinelle Hall example.

Dwinelle Auditorium Side View: RANSAC Points

Click to enlarge.

Dwinelle Auditorium Straight-On View: RANSAC Points

Click to enlarge.

Then, I calculated an approximate homography matrix from all the inliers, using the least-squares fit. Applying this homography to the side view of Dwinelle Auditorium image produced a warped image from the perspective of the straight-on image.

Dwinelle Auditorium Side View: Warped to Straight-On View

Click to enlarge.

Part 5. Example Mosaics

In this part, I produced a mosaic from two images. I first warped one of the images using the approximate homography matrix, calculated in the previous step. I then stitched the two images together by using two correspondence points in each image. I picked two points that were farthest away from each other, then calculated the new large canvas that would fit both images with overlap. I blended the two images in the overlapping region by using a gradient mask, in the same way I blended images in Part A.

Gradient Mask

Click to enlarge.

Examples below show mosaics made with hand-picked correspondence points, created in Part A (top) and mosaics made with automatic feature detection, created in Part B (bottom).

Dwinelle Auditorium Mosaic: Hand-Picked Correspondence Points (from Part A)

Click to enlarge.

Dwinelle Auditorium Mosaic: Automatic Feature Detection

Click to enlarge.

The bottom auditorium mosaic (automatic) has fewer visible artifacts. Overall, it looks crisper than the top auditorium mosaic constructed from hand-picked correspondence points. There is some visible ghosting in both mosaics. In the bottom one, the ghosting is near the left side of the stairs, and in the top mosaic, ghosting is most pronounced near the left side and the door areas.

Other examples include house interior mosaic, and Berkeley Rose Garden mosaic.

The house mosaic is shown below, along with two input images.

House Interior: Left View

Click to enlarge.

House Interior: Straight-On View

Click to enlarge.

The automatically chosen RANSAC points to produce the house mosaic are shown below.

House Interior Left View: RANSAC Points

Click to enlarge.

House Interior Straight-On View: RANSAC Points

Click to enlarge.

House Mosaic: Hand-Picked Correspondence Points (from Part A)

Click to enlarge.

House Mosaic: Automatic Feature Detection

Click to enlarge.

The bottom house mosaic (automatic) has fewer visible artifacts. Overall, it looks sharper than the top house mosaic constructed from hand-picked correspondence points. Instead of using 8 hand-picked correspondence points (in Part A), the RANSAC algorithm kept 15 correspondence points, which were more evenly distributed and produced better image alignment for the mosaic.

The Rose Garden mosaic is shown below, along with two input images. The auto-detection algorithm for this mosaic kept points towards the bottom of the images, and there were 6 RANSAC points, compared to 8 manually-chosen points (in Part A). Therefore, the bottom part of the mosaic looks crisper than the mosaic created from the hand-picked correspondence points, but the middle and top parts of the auto-created mosaic look blurrier.

Rose Garden: Alley View

Click to enlarge.

Rose Garden: Tree View

Click to enlarge.

The automatically chosen RANSAC points for the Rose Garden mosaic are shown below.

Rose Garden Alley View: RANSAC Points

Click to enlarge.

Rose Garden Tree View: RANSAC Points

Click to enlarge.

The Rose Garden mosaics are shown below.

Rose Garden Mosaic: Hand-Picked Correspondence Points (from Part A)

Click to enlarge.

Rose Garden Mosaic: Automatic Feature Detection

Click to enlarge.

The bottom rose garden mosaic (automatic) has a similar amount of visible artifacts compared to the mosaic from Part A. Overall, it has a similar sharpness to the top rose garden mosaic, constructed from hand-picked correspondence points. The post in the middle of the bottom mosaic has more artifacts than in the hand-created one. The hand-picked correspondences were picked after several trial-and-error runs to get the best image alignment, and were more evenly distributed than the RANSAC chosen points, which concentrated in the bottom halves of the images.

In conclusion, the automatic mosaics I created were generally of better quality than the ones created from hand-picked correspondences, but not always. The Rose Garden mosaic is an example of hand-picked correspondences producing a similar or better results than the automatic detection.

Lessons Learned

The coolest thing I learned from this project is automatically detecting features by using Harris corner detection algorithm from skimage.feature Python library, then drastically reducing the number of matches with nearest-neighbors metric and RANSAC algorithms. The automatically detected features were more evenly spread over the image than hand-selected points. Therefore, in general, automatic feature detection was more optimal in both the time it took to create a mosaic and the image alignment (fewer artifacts). Not all artifacts were gone though. Since the homography matrix is approximate, not all points in two images being combined into a mosaic matched perfectly.

Bells and Whistles

I didn't implement any Bells and Whistles for Part B.