%load_ext autoreload
%autoreload 2
from main import *
import numpy as np
import scipy as sp
import scipy.ndimage
import cv2
import matplotlib.pyplot as plt
import einops as eo
from PIL import Image
import torch
import torch.nn.functional as F
import glob
import os
os.getcwd()
import warnings
warnings.filterwarnings('ignore')
file = 'data/emir.tif'
im = loadplate(file)
print("Input Plate")
pltim(im, *sep(im))
Input Plate
In order to achieve good performance on aligning the color channels, regardless of scale, we use a pyramid of progressibley more blurred images. Features are then computed from each of the layers of the pyramid, which in this case is the sum of absoute x, y derivatives. This is computed using horizontal and vertical sobel filters, which are a convolution of a 1-d [-1, 0, 1] filter and a 2d gaussian.
For each layer of the pyramid, working from bottom to top, we display the misaligned image, misaligned feature maps, and individual feature maps for each channel.
blur_ims, feat_ims = feat_pyramid(im, max_d=4, sigma=1, plt=True)
Input Image and Feats
Blur and Feats at Depth: 0
Blur and Feats at Depth: 1
Blur and Feats at Depth: 2
Blur and Feats at Depth: 3
Now to align the channels we start by computing a convolution between padded versions of the feature maps and a target channel. The offset of the maximum of produced "matching-maps" to the centers determines the optimal offset for the channel.
Next we apply the offset to improve the alignment and take a step down the pyramid (we started with the lowest res image and are working down) Repeat this process till the bottom of the pyramid is reached.
The first three images are the alignment heatmaps produced by convolving the features together. Seen below that are the input, output pairs of images and feature maps - before and after alignment at the current iteration.
The visualizations are shown for each layer of the pyramid, from top to bottom.
aligned_im, offsets = match_coarse_to_fine(blur_ims, feat_ims, padding_proportion=np.asarray([.1, .1]), plt=True)
Total Offset: [[0, 0], [0, 0], [0, 0]] Cropping: [20 23] Misalignment: 0.18744768997180775 Offset: [(-5, -2), (-3, -2), (0, 0)]
Aligned Image:
Total Offset: [[-10, -4], [-6, -4], [0, 0]] Cropping: [40 46] Misalignment: 0.18257340442508707 Offset: [(-3, -1), (0, 1), (0, 0)]
Aligned Image:
Total Offset: [[-26, -10], [-12, -6], [0, 0]] Cropping: [80 92] Misalignment: 0.15910283783225349 Offset: [(-1, 0), (0, 0), (0, 0)]
Aligned Image:
Total Offset: [[-54, -20], [-24, -12], [0, 0]] Cropping: [160 185] Misalignment: 0.16985611141227305 Offset: [(1, 0), (-1, 0), (0, 0)]
Aligned Image:
Total Offset: [[-106, -40], [-50, -24], [0, 0]] Cropping: [320 370] Misalignment: 0.1692123092294079 Offset: [(-1, 0), (1, 0), (0, 0)]
Aligned Image:
The first set of histograms, corresponding to each channel are poorly aligned, and lead to odd coloring of the image, plus there are small bits of noise sprinkled im the image.
To fix this we blur the image with a small kernel, since it is very high res and has unneeded high-frequency data, and then normalize the colors to match the green channel.
The final image histograms are much smoother and have much more comparable ranges to eachother. This results in a more natural looking image
filtered_im = np.stack([sp.ndimage.gaussian_filter(aligned_im[..., i].copy(), sigma=1) for i in range(3)], axis=-1)
final_im = normalize_colors(filtered_im, target=2)
pltim(aligned_im, figsize=(10,3), dpi=100)
pltimhist(im)
pltim(final_im, figsize=(10,3), dpi=100)
pltimhist(final_im)
red mean=0.576 var=0.0726
green mean=0.474 var=0.0581
blue mean=0.516 var=0.0605
red mean=0.514 var=0.038
green mean=0.514 var=0.038
blue mean=0.514 var=0.038
We show the results of the whole pipeline applied to every given image, misaligned inputs are on the left and processed outputs are on the right.
Offsets and image file are printed above each image
Some notable faliure cases include church.tif
, harvesters.tif
, and self_portrait.tif
where large color splotches are present due to movement in the scene during photography.
for file in glob.glob("data/*"):
print(file.split("/")[-1])
in_im, out_im = whole_pipeline(file)
pltim(in_im, out_im, figsize=(10,3), dpi=100)
church.tif shape=(3202, 3634, 3) depth=5 offsets=[[-116, 8], [-50, -8], [0, 0]] Runtime: 14.2s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
train.tif shape=(3238, 3741, 3) depth=5 offsets=[[-170, -58], [-84, -4], [0, 0]] Runtime: 14.9s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
master-pnp-prok-00100-00163u.tif shape=(3215, 3642, 3) depth=5 offsets=[[68, -22], [46, -18], [0, 0]] Runtime: 14.3s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
monastery.jpg shape=(341, 391, 3) depth=2 offsets=[[-6, -4], [6, -4], [0, 0]] Runtime: 0.11s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
cathedral.jpg shape=(341, 390, 3) depth=2 offsets=[[-24, -6], [-10, -4], [0, 0]] Runtime: 0.119s
emir.tif shape=(3209, 3702, 3) depth=5 offsets=[[-214, -80], [-98, -48], [0, 0]] Runtime: 14.6s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
workshop.tif shape=(3209, 3741, 3) depth=5 offsets=[[-210, 24], [-106, 2], [0, 0]] Runtime: 14.6s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
harvesters.tif shape=(3218, 3683, 3) depth=5 offsets=[[-248, -26], [-120, -34], [0, 0]] Runtime: 14.6s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
self_portrait.tif shape=(3251, 3810, 3) depth=5 offsets=[[-350, -74], [-156, -58], [0, 0]] Runtime: 15.4s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
master-pnp-prok-00100-00107u.tif shape=(3218, 3762, 3) depth=5 offsets=[[-270, -172], [-134, -102], [0, 0]] Runtime: 14.6s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
tobolsk.jpg shape=(341, 396, 3) depth=2 offsets=[[-12, -6], [-6, -4], [0, 0]] Runtime: 0.109s
three_generations.tif shape=(3209, 3714, 3) depth=5 offsets=[[-222, -18], [-108, -24], [0, 0]] Runtime: 14.3s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
melons.tif shape=(3241, 3770, 3) depth=5 offsets=[[-354, -26], [-160, -20], [0, 0]] Runtime: 14.8s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
onion_church.tif shape=(3215, 3781, 3) depth=5 offsets=[[-214, -70], [-102, -52], [0, 0]] Runtime: 15.2s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
icon.tif shape=(3244, 3741, 3) depth=5 offsets=[[-180, -46], [-82, -34], [0, 0]] Runtime: 15.3s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
lady.tif shape=(3212, 3761, 3) depth=5 offsets=[[-240, -26], [-112, -18], [0, 0]] Runtime: 15.1s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
master-pnp-prok-00100-00182u.tif shape=(3260, 3817, 3) depth=5 offsets=[[-250, -70], [-118, -56], [0, 0]] Runtime: 15.3s
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).