CS194-26 Project 5 Part A: Image warping and Mosaicing

We're going to be doing some mosaicing! This is when multiple images from the same point are combined into one large one using homographies. To capture images I used a digital camera with a 90mm lense. All images for a specific mosaic were taken at the same exposure to keep the color consistent. Here's a sample of some of the pictures I took of doe library

jpg

1. Computing homographies

In order to compute homographies, we need to determine points in the intersection of two images where they correspond. This transformation should warp a given part of an image on top of the corresponding part of the other. We first start by choosing correspondence points.

Little Bell/Whistle 1: We also fine-tune our point selections by taking a small 160x160 pixel section around each corresponding pair of points and correlate the two patches after normalization. This lets us determine the error in our alignment and we can select the point of maximal correlation to set the two points to.

jpg jpg

After finetuning, here are the points correspondence we get on each of the images.

jpg

Now we can make our homography!

If we want to map the point a = (x, y, 1) to another point b = (u, v, 1) We can write out Ha = b as a system of 9 equations, we get, for each point given, the 2x9 matrix

[[-x, -y, -1,  0,  0,  0,  u*x,    u*y,  u],
 [0,   0,  0, -x, -y, -1,   v*x,    v*y , v]]

If we write the 3x3 matrix H as a vector h=[h11, h12, h13, h21, h22, h23, h31, h32, h33], we can solve the system Ah=0 Where A is what we get if we stack each of the above matrices for each point. We can then find the best non-trivial solution by computing the SVD of A=USV' and taking h as the last vector of V, then normalizing it such that h33=1.

Little Bell/Whistle 2: It turns out solving the above system can be somewhat numerically unstable, and this can lead to slight errors in the alignment of pixels. I explored a variety of other potential solutions.

One method of doing so is to "condition" the input points such that they converge to a better solution. As explained here, we can compute a new transformation, T which transforms the points such that their mean is at the origin and their average distance from the origin is sqrt(2) (this is done by scaling by the sample standard deviation of the points). We then output T^-1 HT so that our transformation "conditions" the points, applies the homography, then reverses the conditioning.

Let's look at the error we encounter when solving the normal homogenous system. Below is the difference between the transformed points and the desired output:

[[  0   0   0   0  -6 -14 -20  -1 -20  -5   5   0 -11 -11 -12  -2]
 [  0   0   0   0   5   2 -13   9  15  18  16  -4  -2  -6 -22   6]]
MSE 97.90625

This difference is as much as 20 pixels! When we apply our conditioning method, we get a much closer result:

[[ 14   2  -2   2   3   2  -6   2  -1  -2  -1   1  -4  -6  -6   2]
 [  7   1   1  -4   4   5  -4   4  -1   2   4  -4   2  -2 -18   4]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]]
MSE 27.65625

Wow! Our Mean Squared Error (MSE) dropped from ~98 to ~28! Sweet!

Another potential way of solving this system which doesn't rely on computing the svd is to set h33=1 to begin with and then solve for Ha=b with 8 unknowns. This yields a very similar system as listed above and instead we can solve Ah=B where A is our system and B is all the b points flattened.

This also seems to converge to a similar solution as our conditioned SVD method did! Fascinating!

[[ 14   2  -2   2   3   2  -6   2  -1  -2  -1   1  -4  -6  -6   2]
 [  7   1   1  -4   4   5  -4   4  -1   2   4  -4   2  -2 -18   4]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]]
MSE 27.65625

I then became super curious as to how good different methods were at computing these homographies. I tested all of my code against implementations in OpenCV (RANSAC, RHO, LMEDS, and linear) to see how they performed on our system. Unsurprisingly they all converged to a very good solution somewhere around the conditioned homogenous system. These other algorithms like RANSAC are also able to detect outliers however, so for less-cleanly defined points, they may converge to higher MSE-yielding solutions which are in fact better than our simple linear least squares approximation.

jpg

We can now apply the computed homographies to our images!

jpg

By detecting where the images' bounding boxes intersect we can create a nice smooth gradient between them and use a combination of masks to blend them together.

def blend(left, right, int_pts, debug=False):
    mask = skdraw.polygon2mask(left.shape, int_pts)
    mask = mask * (left > 0)
    l, r = int(np.floor(np.min(int_pts[:, 1]))), int(np.ceil(np.max(int_pts[:, 1])))
    smooth = np.zeros_like(left, dtype=np.single)
    rep = np.repeat(np.linspace(-30, 30, r - l, dtype=np.single), left.shape[0]).reshape(-1, left.shape[0]).T
    rep = np.tile(rep, (3, 1, 1)).transpose((1, 2, 0))
    smooth[:, l:r] = rep
    smooth = scipy.special.expit(smooth)*mask
    res = left * (1 - smooth) + right * (1 - mask + smooth)
    res = np.clip(res, 0, 255).round().astype(np.uint8)
    if debug:
        return res, mask, smooth
    else:
        return res

w12_res, mask, smooth= blend(w12_ima1, w12_ima2, w12_int_pts, debug=True)
fig, axs = plt.subplots(3, 2, figsize=(20, 20))
ax1, ax2, ax3, ax4, ax5, ax6 = axs.flatten()
ax1.imshow(mask.astype(np.float))
ax2.imshow(smooth)
ax3.imshow(1 - smooth)
ax4.imshow(1 - mask + smooth)
ax5.imshow(w12_ima1/255 * (1 - smooth))
ax6.imshow(w12_ima2/255 * (1 - mask + smooth))
plt.figure(figsize=(20, 40))
plt.imshow(w12_res)
<matplotlib.image.AxesImage at 0x7fbba02eb3d0>

jpg

jpg

Doing this for the other pair of images we get:

jpg

With some tricky index math we can then combine these two blended images together into one sweet mosaic:

jpg

Here's how it performed on a couple other images I took:

jpg

jpg

jpg

jpg

jpg

jpg

jpg

jpg

Rectification

Another thing we can do is transform our images such that surfaces which were diagonal before now become planar in the eyes of our camera:

jpg

jpg

jpg

jpg

jpg

jpg

The coolest thing I learned was definitely the different methods of finding homographies! I'm sure the different robust techniques will be useful when we go to do it automatically in the next part of the project!