Preface

Wow, this entire project was a disaster for me. I had too many non-academic external forces happening during the past month for me to really be able to focus on this project, nor did I even really have the patience to reimplement this problem from scratch given it's a solved problem in many programs and python libraries. I was seriously shocked at how challenging this project was for me, considering I have quite a bit of experience in imaging, graphics, computer vision, and computational photography related courses (7 total, 5 in the CS department, 4 at the graduate level); I just was really not vibing with this project. I'm submitting what I have in the hopes of some type of partial credit at this point so I can shift my focus to trying to get some credit for my final project, which I think my skills will fit much nicer into. If you're wondering why this is so late, please check with Professor Efros for relevant extension requests in my email correspondence with him (I have had a variety of extenuating circumstances and it has obviously affected my ability to complete this project). I still have not gotten a clear answer to whether comments in my code are considered for the rubric. I have copied the most salient info regarding my approach here, but if you are wondering if I actually know how to do something or want more details on my approach, I believe my code is very well documented and commented and you will likely find any answer to any questions about my approach there.

Part A - Image Warping and Mosaicing

A1) Image Collection

The pictures I chose to use were a collection of images that I shot from an art installation I did last semester. I had trouble panorama stiching them automatically, so I thought they would be perfect for this project.

A2) Recovering Homographies

I tried two approaches to try to make this work. Neither worked as well as a ground-truth implementation such as that given in OpenCV. You can see a comparison of the results below. Strangely, even the ground truth of OpenCV doesn't seem to match up with the actual ground truth points, but it works in the later image warping.

Basic Linear Systems

Below is my function description code that delineates the mathematical approach of my first attempt at computing the homography. I then solved the final system using least squares. def computeH_1(im1_pts, im2_pts): """ Computes Homography given the input corresponding points :param im1_pts: Points to compute homography from :param im2_pts: Points to compute homography to :return: Homography Matrix Using input points p_i=(x_i, y_i) from im1 and p_i'=(x_i', y_i') from im2 the homography is: p'.T = H @ p.T |x_1' x_2' x_3' | |h00 h01 h02| |x_1 x_2 x_3 | |y_1' y_2' y_3' ... | = |h10 h11 h12| @ |y_1 y_2 y_3 ... | |1 1 1 | |h20 h21 h22| |1 1 1 | We can reformulate this as b = A @ x to recover H with least squares as follows: |-x_1'| |x_1 y_1 1 0 0 0 0 0 0 | |-y_1'| |0 0 0 x_1 y_1 1 0 0 0 | |1 | |0 0 0 0 0 0 x_1 y_1 1 | |-x_2'| |x_2 y_2 1 0 0 0 0 0 0 | |-y_2'| = |0 0 0 x_2 y_2 1 0 0 0 | @ |h00 h01 h02 h10 h11 h12 h20 h21 h22|.T |1 | |0 0 0 0 0 0 x_2 y_2 1 | |-x_3'| |x_3 y_3 1 0 0 0 0 0 0 | |-y_3'| |0 0 0 x_3 y_3 1 0 0 0 | |1 | |0 0 0 0 0 0 x_3 y_3 1 | | ... | | .... | """

SVD Composition

Below is my function description code that delineates the mathematical approach of my second attempt at computing the homography, based on a mathematical result I found online. def computeH_2(im1_pts, im2_pts): """ Computes Homography given the input corresponding points :param im1_pts: Points to compute homography from :param im2_pts: Points to compute homography to :return: Homography Matrix After this problem gaslighting me to hell, I tried a different approach given on StackOverflow here: https://math.stackexchange.com/questions/494238/how-to-compute-homography-matrix-h-from-corresponding-points-2d-2d-planar-homog Basically, we want to solve this, but use SVD to avoid the trivial answer: | 0 | |x_1 y_1 1 0 0 0 -x_1*x_1' -y_1*x_1' -x_1'| | 0 | |0 0 0 x_1 y_1 1 -x_1*y_1' -y_1*y_1' -y_1'| | 0 | |x_2 y_2 1 0 0 0 -x_2*x_2' -y_2*x_2' -x_2'| | 0 | = |0 0 0 x_2 y_2 1 -x_2*y_2' -y_2*y_2' -y_2'| @ |h00 h01 h02 h10 h11 h12 h20 h21 h22|.T | 0 | |x_3 y_3 1 0 0 0 -x_3*x_3' -y_3*x_3' -x_3'| | 0 | |0 0 0 x_3 y_3 1 -x_3*y_3' -y_3*y_3' -y_3'| |...| | .... | """

A3) Image Warping

In order to warp, I tried to invert our homography to transform every pixel coordinate in the transformed image to a sample in the original image. To do this, I tried to estimate the dimensions of the transformed image by transforming the bounding box. I then made a grid of the coordinates in this estimated output image, and sampled from the initial image accordingly. However, something didn't work with this approach, and I don't know why. You can see my warp (left) and the output of a ground truth implementation (OpenCV, right) below.

A4) Rectification

As you might guess, this part of the project wasn't looking too promising.

A5) Blending Mosaic

I tried to complete this portion of the project as well, but wasn't even able to even when using the cv2 ground truth implementations, because I was losing my mind with my transformed images going out of image bounds. No matter what I did, I couldn't seem to account for these bounds by offsetting. You can see my attempt at a final mosaic using my approach (left) and with the help of OpenCV below (right). Welp, ¯\_(ツ)_/¯. No hope for part B...

A6) Conclusion

Through this project, I learned that if you find it frustrating to reimplement functions already implemented in other libraries, maybe find a different solution (in this case, take the L on this project) before starting down a fruitless rabbithole.

Part B - Feature Matching for Autostitching

Let's try to automate Part A!

B1) Detecting Corners

To detect corners, I simply ran the provided starter code. Because I was using quite large images, the code returned many harris corners. I originally tried to use a random sampling to reduce the number of corners, but I found that setting the min_distance of the harris detection code provided to 50 yielded more evenly sampled corners. Below are some detected corners for my input images:

B2) Extracting Features

To extract features, I created a function (get_features) that detects the harris corners then manipulates each harris corner into a meshgrid (np.meshgrid) that we can use to sample our image. That is, for every harris corner, we get a matrix of x coordinates from which to sample and a matrix of corresponding y coordinates from which to sample in our image. Before detecting corners, I converted to grayscale and histogram equalized in order to make detection easier (since these are mostly low-light images). I then used these meshgrids to sample from a gaussian-blurred version of the image instead of just the raw image (to better incorporate the image information around each sample that wouldn't be directly sampled otherwise) to create my features. I utilized an overall similar approach as in the paper utilizing a stride of 5, except I sampled a 7x7 feature (instead of 8x8 in the paper). This means I am effectively sampling information from a roughly 35x35 window instead of a 40x40 window. I spent quite a while implementing this in the most "numpy" way possible to ensure high speeds (although this ended up being mostly irrelevant as I decided to filter out quite a few Harris corners). To normalize my features, I subtracted by the mean of each feature and divided by the standard deviation. I then reshaped my features so that the output of my function was a 2D matrix with each row corresponding to a distinct normalized feature.

B3) Matching Features

To match the features, I first computed features for each image according using get_features (described above). I then used the included dist2 function to efficiently calculate the squared distances between each feature. I then utilized the Lowe distance approach given in the paper. For each feature from the first image, I determined the two corresponding features from the second image with the lowest SSD. I then divided the best feature score and the second best feature score to calculate the Lowe distance for each feature in the first image. I then sorted this array of Lowe_distances to determine the best features (i.e. the "top_n" features with the smallest Lowe_distances, "top_n" being an input to my match_features function). Using this sort order, I was able to isolate the harris corner points in the first image with the best correspondences to the second image. Using np.argmin on the dist2 matrix, I was able to then find the corresponding indices of the closest features in the second image. I then returned these two vectors of points for use in the rest of the project.

B4) RANSAC

I implemented my RANSAC approach as given in lecture, with a few input hyperparameters to tune for best results. These include: n - the number of features to randomly sample then build our initial model from, k - the maximum number of iterations to try, t - the SSD threshold at which to consider a point an inlier to the homography estimate. This part of the project was quite straightforward, as I simply followed the simplified RANSAC algorithm discussed in lecture. However, I again took time to implement it as "numpy" as possible to avoid loops. Namely, I parallelized my inlier calculation for each iteration of the loop. Thus, at each iteration of my loop, I sample n data points, calculate a possible homography with them, and counts how many points overall are an inlier to this model (in parralel). I then proceed over all k iterations to maximize the number of inliers. Once I determine which points make a model with the most inliers, I calculate the final homography from the set of all inliers.

Since I think my images are quite difficult for the feature matching process to calculate correctly, I compared my RANSAC algorithm with a ground truth implementation from OpenCV. My results are on the top, and the OpenCV results are on the bottom. Each point of a given color in the left image has a corresponding point in the right image. It is worth noting that although both algorithms appear to function poorly on these images, my RANSAC algorithm calculates a homography with more point correspondences as inliers in all cases: for instance with the third set of images, mu approach calculates 46 out of 500 feature points as inliers, whereas the OpenCV implementation only calculates 43 inliers with the same threshold.

B5) Final Mosaic

Unfortunately, given the bad results on my images, even with histogram equalization, my final mosaics are... questionable. Here are my results, in the same order as displayed in section A1, using my implementation from part A (which unfortunately I still couldn't get correct):