Overview
In this part of the project, we stitch images shoot at almost same place but with different consecutive angle into a big large picture. The idea is very simple: we project all the photos linearly into a common plane, which we compute by defining common pairs of points shared by every pair of images. Then we simply blend them together to get the big photo.
Manual stitching
Shoot the Pictures
Thanks to Japheth Wong, I used his photos for our demo of mosaicing technique.
Scene 1 | Scene 2 | Scene 3 |
---|---|---|
Recover Homographies
We shall first recover homographies by selecting several points in the photo and assign the place that they should end up with in the “common plane”. Note by doing so we got several pairs of corresponding points. We here only consider the situdation of such transform of points is projective transformation. Mathematically we know all projective transformation can be defined by a 3x3 matirx. So we propsoe this problem as a least square problem.
We need to minimize sum(\(||Hx - x'||_2^2\)). By modifying this optimization problem a little bit, we can compute the transformation matrix and solve it using any optimization package:
# Matrix for transforming im1 into im2
def computeH(im1_pts,im2_pts):
a = np.array(im1_pts).T
b = np.array(im2_pts).T
A = None
B = None
for i in range(a.shape[0]):
p1 = a[i, :]
p2 = b[i, :]
new_stack = np.zeros((2, 8))
new_stack[0, 0:3] = np.append(p1, 1.0)
new_stack[0, 6:8] = np.array([-p1[0]*p2[0], -p1[1]*p2[0]])
new_stack[1, 3:6] = np.append(p1, 1.0)
new_stack[1, 6:8] = np.array([-p1[0]*p2[1], -p1[1]*p2[1]])
if A is None:
A = new_stack
B = p2
else:
A = np.vstack((A, new_stack))
B = np.concatenate((B, p2))
sol = np.linalg.lstsq(A, B)[0]
return np.append(sol, 1.0).reshape((3,3))
Warp the Images
Now we got the way to computing the transformation matrix, we can define a set of correspoding images between every pair of overlapped images. We first define their correspondence points as such:
Scene 1 | Scene 2 |
---|---|
Scene 2 | Scene 3 |
---|---|
Now we use the middle picture as the common place, then warp the photos on two sides into this plane:
Scene 1 | Scene 2 | Scene 3 |
---|---|---|
Image Rectification
Note once we are able to compute the projective transormation matrix, we are also able to rectify any photo by defining the any four points as the orthogonal plane (orthogonal to our sight).
Scene | Rectified scene |
---|---|
Scene | Rectified scene |
---|---|
Scene | Rectified scene |
---|---|
Blend the images into a mosaic
In the section of image warping, we have already gotten the ability to project each photo onto the right plane, what we need to do to stitch images is only posistion them correctly:
Scene 1 | Scene 2 | Scene 3 |
---|---|---|
Here are more examples from (Hugin)[http://hugin.sourceforge.net/tutorials/two-photos/en.shtml]
Scene 1 | Scene 2 |
---|---|
We project scene 2 onto scene 1 plane
And we can also project scene 1 onto scene 2 plane
Auto stitching
It is natural to think we can label corresponding points automatically. After finding a lot of prominent points, however, we still need to filter out outliers and generate a robust homography estimation to construct transformation matrix between photos. I will demonstrate how to use following tools to achieve autostitching.
Harris Interest Point Detector
Corners, as opposed to edges and flat areas, are more prominent. We can use Harris corner detector to get all the prominenet corners in a photo to start our search for corresponding points in two photos.
Scene 1 | Scene 2 | Scene 3 |
---|---|---|
Adaptive Non-Maximal Suppression
The next step is to reduce the point set to find points representative enough for estimating homography matrix. If we just use the strongest n harris points, we will get results like this:
Scene 1 | Scene 2 | Scene 3 |
---|---|---|
In which the chosen points tend to cluster and thus produce biased homography since the areas contribute differently to the transformation estimation. Instead we can use Adaptive Non-Maximal Suppression proposed in Multi-Image Matching using Multi-Scale Oriented Patches by Matthew Brown, Richard Szeliski and Simon Winder. The idea is to choose n points that has a stronger neighbours and define the distance to that closest stronger neighbor by r. For all points compute such radius and sort them descendingly, choose the largest n radiuses so their corresponding points are representation of all points. We can see the effect in the same example:
Scene 1 | Scene 2 | Scene 3 |
---|---|---|
Feature Descriptor extraction - SIFT
To boost up and simplify similarity comparison between points, we describe an interest point by the area around it. Take a 4040 patch and divide it into 64 smaller patches of size 55. Then for each little 5*5 patch, we take gradient and count magnitude of different 8 discritized angles, then we emphasize the center of the large patch by multiply the magnitudes by a Gaussian kernel. By doing so we create descriptors representative enough to contain information of a point’s position.
Feature Matching
Once we have the feature descriptors, we can search across points from different photos to get the closest points, which supposedly are correspondece points, however, as suggested in the paper, nearest neighbor does not perform well since a lot of points does not have corresponding points in other photo’s point set. Instead using a ratio of 1st nearest neighbor/2nd nearest neighbor is much more discriminative. Note we also enforce consistency by asking the reverse correspondence holds too, i.e. say if we get point B satisfies condition of A, we also want A satisfies condition posed by B. We can see results on the same dataset below:
Scene 1 | Scene 2 |
---|---|
Scene 1 | Scene 3 |
---|---|
Scene 2 | Scene 3 |
---|---|
4-point RANSAC: Robust homography estimation
Evern if we enforce consistency and has a good metric to select corresponding points, there are still bad points inside our chosen point sets. Howeverm most points are good by this time, so we can randomly choose 4 points and see how many points correspondence relationship agree with the homography generated by these 4 points. Repeatdely doing so, we can find the largest set of points agreeing with a homography and we use this set to compute the final homography matrix. We put such homography matrix into the code we used for manual stitching, we can see now we are able to do auto-stitching!
Manual | Auto |
---|---|
Zooming in, we can see computer is doing a better job than me! Apply this technique to more images, we got:
Manual | Auto |
---|---|
Let’s see what if we apply it to a set of consective 8 photos taken from adobe panoramas dataset: