CS 194-26: Project 4

For this project, we calculated the homography between images using manually selected shared points. The homography was then used to warp one of the images to the other to create a blending of the different images. By stitching multiple images together, we are able to create a panorama. We are also able to use the same idea of homography to rectify images so that rectangular object in the real world can be warped so that they are flat in the image as well.

Feature Detection

To know how the images should merge together, we need to select points that exist on both images. These potential points can be selected manually or automatically. Manual selection of the points is simple, but it requires someone to spend the time to figure out which points are in both images and select them in the same order for both images.

Instead, we can use Harris interest point detector to find features with corners in the images using Harris's algorithm. Below are random subsets of Harris points on three images that will make up a mosaic.

random subset of Harris points (left image)

random subset of Harris points (center image)

random subset of Harris points (right image)

One problem with the Harris interest point detector is that it outputs a large number of points which make it slow to work with. Therefore, we can use adaptive non-maximal suppression (ANMS) to get a subset of points. Instead of randomly selecting points, ANMS gives points that are spaced out and their corner strengths are locally maximized. It uses the equation below to make the calculation:

r_i = \min_j |x_i-x_j|\text{, s.t. } h(x_i)<c_{robust}h(x_j),x_j\in I

(where $x_i$ is the Harris point, $I$ is the set of all Harris points, $c_{robust} = 0.9$ is used to determine the level of suppression, and $h(x)$ is the corner strength of the Harris point)

We sort the r values of the points and take the top 500 points for a more management subset.

Descriptor Extraction and Feature Matching

Now that we have a set of points that have good corner strengths, we can extract the feature descriptors. A feature descriptor is the area around a feature point which will be used to match a point from one image to another. The feature descriptors were created by blurring the entire image using Gaussian then taking 40x40 area around each point. Each square was then downsampled to 8x8. It was then bias/gain-normalized.

With the feature descriptors on hand, we can compare each feature on one image to every feature on the other image to find which features match the closest with each other on the two images. We used sum of squared differences to determine how close two features are to each other. Since we want matches that we are confident with, we impose a threshold when selecting matches. As the MOPS paper found, using a threshold of 0.66 for the ratio between the first lowest SSD error and the second lowest SSD error produces a fairly good probability distribution for correct matches with limited incorrect matches.

Although there are points that are incorrectly matched, they will be eliminated in the next step.

RANSAC

In order to make sure there are no homography-ruining outlier points, we uses RANdom SAmple Consensus to estimate the homography matrix while avoiding feature-space outliers. We do the following steps 10,000 times with the feature matched points:

Choose 4 random matching features

Calculate the homography matrix with the 8 points

Translate the feature points on one image using the homography

Find the inliers using $SSD(p'_i, Hp_i)<\epsilon$

Store the set that has the largest number of inliers

In the end, we will have the largest set of inlier points. These points can be used to create the homography that will warp one image to the other.

Recover Homographies

By taking pairs of corresponding points, we are able to compute the homography matrix which can be used to warp an image into the perspective of the other image. We can recover the homography matrix using the equation $p'=Hp$ where $p$ is a $\begin{bmatrix}x & y & 1\end{bmatrix}^T$ vector of the point in the first image and $p'$ is a $\begin{bmatrix}wx' & wy' & w\end{bmatrix}^T$ vector of the point in the second image. We can represent the equation as a matrix multiplication:

\begin{bmatrix}wx' \\ wy ' \\ w\end{bmatrix} = \begin{bmatrix}a & b & c \\ d & e & f \\ g. & h & 1\end{bmatrix}\begin{bmatrix}x \\ y \\ 1\end{bmatrix}

We can then rearrange the matrix multiplication to find the homography matrix using multiple points.

\begin{bmatrix}x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1' & -x_1'y_1 \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -x_1y'_1 & -y_1y'_1 \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2x_2' & -x_2'y_2 \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -x_2y'_2 & -y_2y'_2 \\ \vdots && \vdots && \vdots && \vdots \\ x_n & y_n & 1 & 0 & 0 & 0 & -x_nx_n' & -x_n'y_n \\ 0 & 0 & 0 & x_n & y_n & 1 & -x_ny'_n & -y_ny'_n\end{bmatrix} \begin{bmatrix}a \\ b \\ c \\ d \\ e \\ f \\ g \\ h\end{bmatrix} = \begin{bmatrix}x'_1 \\ y'_1 \\ x'_2 \\ y'_2 \\ \vdots \\ x'_n \\ y'_n\end{bmatrix}

With the manually selected corresponding points, we can use least squares to find the homography matrix.

Warp the Images

Using the homography matrix, we can use the dot product on the image coordinates to get transform matrices for x and y. We then use the remap function to interpolate the color values for the warped image. We are then able to create the warped image.

Image Rectification

By using a manually inputted rectangular coordinates, we can warp the image to make a particular element uniformly rectangular.

Blend the Images into a Mosaic

By warping images to a center image, we can then blend the images together to create a panoramic mosaic image. To blend the images, we add alpha values which start at 1 in the center column then linearly decrease to 0 at the ends. We then normalize the overlapping values to sum up to 1 which can then serve as weights for the color values. All the images are added together to create a blended panoramic image.

Outdoor example

Living Room Example

Kitchen Example

Conclusion

The coolest thing I learned from this part was how homography can be used to align points by warping the perspective. This was particularly interesting when we are able to warp images to make objects flat in rectify the image. This allowed unreadable parts of my monitor to be legible after rectifying the image.

Part B

The coolest thing I learned from this part was the feature matching. It's pretty impressive that something as small as 8x8 feature descriptors could be used to find matching feature descriptors in other images, resulting in points that are in the small location visually.