Novel Identification and Grouping of Growing Astronomical Supernovae

Howard Wu and Asha Anoosheh

Big Thanks to Professor Alexei Filippenko, Weikan Zheng, Ph.D, and Isaac Shivvers, Ph.D Candidate


Link for Project Paper



In this project, we implement an algorithm to perform face morphing, compute the mean of population faces, and extrapolate from a population mean to construct a caricature. A morph is a simultaneous warp of the image shape and a cross-dissolve of the image colors. The warp is controlled by defining a correspondence between two pictures. The correspondence should map eyes to eyes, mouth to mouth, chin to chin, ears to ears, etc., to get the smoothest transformations possible.

Top: Template Image and Subject Image, Bottom: Difference between Template and Transformed, Subject Image, and Transformed, Subject Image


We start by developing a system for aligning images. We define the template image to be the base image and we define the subject image to represent the image to be aligned. Due to the possible variations in weather and exposure, we analyze the Fourier spectral content of each individual image, in place of the direct pixel content of the images. To do this, we take the Discrete Fourier Transform of the images in question, given by:

Using this technique provides more accuracy and clarity for determining rotation and translation of our subject image with respect to the template image. Our system considers analysis at a base of 0 degrees, with +/- 10 degrees of freedom. Note that any arbitrarily define rotation and degree of freedom can be define with our system. In addition, a deterministic number of iterations is provided for the depth of analysis one wishes to perform on the subject image in comparison to the template image. The typical range for iterations is between one to three. From there, the similarity of the two images are considered and a rigid transformation is applied to the subject image. We define the transformed, subject image to be our subject image with rotation and translation applied.

HOTPANTS - High Order Transform of Point Spread Function (PSF) and Template Subtraction

Now with a template image and transformed, subject image, we proceed to perform image subtraction. The Earth’s atmosphere causes light from stars to spread in a Gaussian shape, and in varying amounts due to weather and exposure lengths. It can be approximated as a convolution, due to symmetry, on the image. To account for these possible differences in weather patterns and also exposure times between photographs, we determine a convolutional kernel $K$ that matches with the point spread function (PSF). The idea behind this technique is that given the same star on two separate images with differing exposure times, without loss of generality, a star in the template image could be double in radius in comparison to the same star in the transformed, subject image. We can then use this technique to approximate a match between the two stars and therefore when performing image subtraction, correctly remove the matching stars.

To perform these operations, we use HOTPANTS, High Order Transform of Point Spread Function (PSF) and Template Subtraction. HOTPANTS works by dividing the provided images into several regions and fitting a convolution kernel for each region. The kernel sum is used to sigma clip outliers from the distribution when solving for individual sections of regions. HOTPANTS successfully blends and subtracts the template and transformed, subject image, outputting a subtracted image.

Left: Source Extractor, Right: Identified Areas of Interest

With the subtracted image, our system proceeds to use Source Extractor (Sextractor) for automated detection and photometry. Our goal in using this is to detect any points of interest in the subtracted image, where interest points are meant to represent new astronomical bodies previously unaccounted for in the template image. Sextractor works by determining what the background is and proceeding to check whether pixels belong to an object or to the background. We generate an index of (X, Y) coordinates which represent the points of interest. Note that each coordinate pair comes with a flux value that can be used for thresholding.

General supervised learning entails learning a model from a training set of data for which we provide the desired output for each training example. A set of data known as training examples must be given to the learning algorithm, from which a trained model is obtained, and this model can be used now to return desired outputs for future examples. In this case we want to designate a detection as a real transient or an artifact, and thus the problem is a supervised classification task involving two classes: true and false. Classification is a common machine-learning task that can be performed by many various algorithms.

These algorithms come in two flavors: discriminative and generative. The former attempts to find a decision boundary within the data, so that it best divides the data into two spaces, such that it minimizes the number of examples that fall on the wrong side of the boundary. The latter creates probability distributions based off the examples of each class, forming implicit decision boundaries that signify where the probability of belonging to each class is equal. The boundaries for generative classifiers can be nonlinear, though most boundaries for discriminative classifiers are linear by default. To address this, special mapping functions known as kernels can be applied to the data. The details of kernels are far too much to elaborate on here, but, in short, they allow for nonlinear, and therefore more powerful, decision boundaries.

Boundaries for Kernal Logistic Regression and SVMs

In our case, the ideal situation would be to have - not a discrete output (0 or 1) - but continuous values indicating the probability of an example belonging to a class. A known generative procedure that does exactly this is known as Logistic Regression. Although it performs regression, it returns values in the range 0 to 1, indicating how likely an example is to be of a certain class, with the decision boundary returning 0.5. This would aid in knowing which examples are near the boundary and thus possibly false-positives or false-negatives that could be manually inspected if needed. Unfortunately this process is limited to a linear decision boundary that may not be powerful enough in all cases. And although its kernelized counterpart, Kernel Logistic Regression (KLR), can generate a nonlinear boundary, it is considered unreasonable performance-wise.

Feature Representation of Supernovae Cutout

A single input for a machine-learning algorithm is merely a set of features, usually represented as a vector. The classifier uses these features to make informed decisions about the data as a whole, naturally finding features that separate the data the best. Our algorithm cuts out a 15 by 15 pixel region of the image surrounding the center of the recorded supernova, and we use the pixel values as our features directly, by unravelling it into a vector of length 225. Our data is of small enough dimension that no dimensionality reduction is required. The experiment is run with and without normalization of data for comparison.

Matrix Representation of Sample Training Data

Over the past couple decades, the Katzman Automatic Imaging Telescope (KAIT) has been taking images of specific regions in the night sky. Until now, trained researchers have examined the captured images for signs of young supernovae and marked their WCS (World Coordinate System) locations, via a method similar to the template subtraction pipeline described previously. We can remove equal-sized patches surrounding the supernovae positions as positive training examples.

To acquire negative training data, patches of the same size are extracted from the image everywhere except for where the supernova resides. Since we have much more of this negative data than positive data, we can weigh the examples inversely proportional to their count to balance the classifier. This places importance on correctly classifying the positive data, otherwise, the classifier will tend to mark everything as negative simply due to sheer quantity.

Classification Procedure

Having the absolute coordinates of the known supernovae, the templates, and the new images, we can create our own training set. By subtracting each template from its corresponding new image, we get all new light sources in the sky. And by converting the WCS coordinates into pixel values in the image, using the aperture type and telescope direction information found in the header of each image file (FITS format), we can cut out a patch of image around that pixel which will contain our supernova. We can also produce three rotated versions of this patch to feed as more training data, as supernovae are rotation-invariant.

Full System Pipeline

Our system is designed to work with imaging data from most telescopes and uses machine learning to quickly identify su- pernovae. Extracted points of interest are efficiently processed for classification training. The system will benefit astronomers with their work on enhancing existing models of supernovae features at infancy stages.