'Displaying stitched images together without cutoff using warpAffine

I'm trying to stitch 2 images together by using template matching find 3 sets of points which I pass to cv2.getAffineTransform() get a warp matrix which I pass to cv2.warpAffine() into to align my images.

However when I join my images the majority of my affine'd image isn't shown. I've tried using different techniques to select points, changed the order or arguments etc. but I can only ever get a thin slither of the affine'd image to be shown.

Could somebody tell me whether my approach is a valid one and suggest where I might be making an error? Any guesses as to what could be causing the problem would be greatly appreciated. Thanks in advance.

This is the final result that I get. Here are the original images (1, 2) and the code that I use:

EDIT: Here's the results of the variable trans

array([[  1.00768049e+00,  -3.76690353e-17,  -3.13824885e+00],
       [  4.84461775e-03,   1.30769231e+00,   9.61912797e+02]])

And here are the here the points passed to cv2.getAffineTransform: unified_pair1

array([[  671.,  1024.],
       [   15.,   979.],
       [   15.,   962.]], dtype=float32)

unified_pair2

array([[ 669.,   45.],
       [  18.,   13.],
       [  18.,    0.]], dtype=float32)

import cv2
import numpy as np


def showimage(image, name="No name given"):
    cv2.imshow(name, image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    return

image_a = cv2.imread('image_a.png')
image_b = cv2.imread('image_b.png')


def get_roi(image):
    roi = cv2.selectROI(image) # spacebar to confirm selection
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    crop = image_a[int(roi[1]):int(roi[1]+roi[3]), int(roi[0]):int(roi[0]+roi[2])]
    return crop
temp_1 = get_roi(image_a)
temp_2 = get_roi(image_a)
temp_3 = get_roi(image_a)

def find_template(template, search_image_a, search_image_b):
    ccnorm_im_a = cv2.matchTemplate(search_image_a, template, cv2.TM_CCORR_NORMED)
    template_loc_a = np.where(ccnorm_im_a == ccnorm_im_a.max())

    ccnorm_im_b = cv2.matchTemplate(search_image_b, template, cv2.TM_CCORR_NORMED)
    template_loc_b = np.where(ccnorm_im_b == ccnorm_im_b.max())
    return template_loc_a, template_loc_b


coord_a1, coord_b1 = find_template(temp_1, image_a, image_b)
coord_a2, coord_b2 = find_template(temp_2, image_a, image_b)
coord_a3, coord_b3 = find_template(temp_3, image_a, image_b)

def unnest_list(coords_list):
    coords_list = [a[0] for a in coords_list]
    return coords_list

coord_a1 = unnest_list(coord_a1)
coord_b1 = unnest_list(coord_b1)
coord_a2 = unnest_list(coord_a2)
coord_b2 = unnest_list(coord_b2)
coord_a3 = unnest_list(coord_a3)
coord_b3 = unnest_list(coord_b3)

def unify_coords(coords1,coords2,coords3):
    unified = []
    unified.extend([coords1, coords2, coords3])
    return unified

# Create a 2 lists containing 3 pairs of coordinates
unified_pair1 = unify_coords(coord_a1, coord_a2, coord_a3)
unified_pair2 = unify_coords(coord_b1, coord_b2, coord_b3)

# Convert elements of lists to numpy arrays with data type float32
unified_pair1 = np.asarray(unified_pair1, dtype=np.float32)
unified_pair2 = np.asarray(unified_pair2, dtype=np.float32)

# Get result of the affine transformation
trans = cv2.getAffineTransform(unified_pair1, unified_pair2)

# Apply the affine transformation to original image
result = cv2.warpAffine(image_a, trans, (image_a.shape[1] + image_b.shape[1], image_a.shape[0]))
result[0:image_b.shape[0], image_b.shape[1]:] = image_b

showimage(result)
cv2.imwrite('result.png', result)

Sources: Approach based on advice received here, this tutorial and this example from the docs.

opencv image-stitching

Solution 1:^[1]

July 12 Edit:

This post inspired GitHub repos providing functions to accomplish this task; one for a padded warpAffine() and another for a padded warpPerspective(). Check out the Python version or the C++ version.

Transformations shift the location of pixels

What any transformation does is takes your point coordinates (x, y) and maps them to new locations (x', y'):

s*x'    h1 h2 h3     x
s*y' =  h4 h5 h6  *  y
s       h7 h8  1     1

where s is some scaling factor. You must divide the new coordinates by the scale factor to get back the proper pixel locations (x', y'). Technically, this is only true of homographies---(3, 3) transformation matrices---you don't need to scale for affine transformations (you don't even need to use homogeneous coordinates...but it's better to keep this discussion general).

Then the actual pixel values are moved to those new locations, and the color values are interpolated to fit the new pixel grid. So during this process, these new locations get recorded at some point. We'll need those locations to see where the pixels actually move to, relative to the other image. Let's start with an easy example and see where points are mapped.

Suppose your transformation matrix simply shifts pixels to the left by ten pixels. Translation is handled by the last column; the first row is the translation in x and second row is the translation in y. So we would have an identity matrix, but with -10 in the first row, third column. Where would the pixel (0,0) be mapped? Hopefully, (-10,0) if logic makes any sense. And in fact, it does:

transf = np.array([[1.,0.,-10.],[0.,1.,0.],[0.,0.,1.]])
homg_pt = np.array([0,0,1])
new_homg_pt = transf.dot(homg_pt))
new_homg_pt /= new_homg_pt[2]
# new_homg_pt = [-10.  0.  1.]

Perfect! So we can figure out where all points map with a little linear algebra. We will need to get all the (x,y) points, and put them into a huge array so that every single point is in it's own column. Lets pretend our image is only 4x4.

h, w = src.shape[:2] # 4, 4
indY, indX = np.indices((h,w))  # similar to meshgrid/mgrid
lin_homg_pts = np.stack((indX.ravel(), indY.ravel(), np.ones(indY.size)))

These lin_homg_pts have every homogenous point now:

[[ 0.  1.  2.  3.  0.  1.  2.  3.  0.  1.  2.  3.  0.  1.  2.  3.]
 [ 0.  0.  0.  0.  1.  1.  1.  1.  2.  2.  2.  2.  3.  3.  3.  3.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]]

Then we can do matrix multiplication to get the mapped value of every point. For simplicity, let's stick with the previous homography.

trans_lin_homg_pts = transf.dot(lin_homg_pts)
trans_lin_homg_pts /= trans_lin_homg_pts[2,:]

And now we have the transformed points:

[[-10. -9. -8. -7. -10. -9. -8. -7. -10. -9. -8. -7. -10. -9. -8. -7.]
 [  0.  0.  0.  0.   1.  1.  1.  1.   2.  2.  2.  2.   3.  3.  3.  3.]
 [  1.  1.  1.  1.   1.  1.  1.  1.   1.  1.  1.  1.   1.  1.  1.  1.]]

As we can see, everything is working as expected: we have shifted the x-values only, by -10.

Pixels can be shifted outside of your image bounds

Notice that these pixel locations are negative---they're outside of the image bounds. If we do something a little more complex and rotate the image by 45 degrees, we'll get some pixel values way outside our original bounds. We don't care about every pixel value though, we just need to know how far the farthest pixels are that are outside the original image pixel locations, so that we can pad the original image that far out, before displaying the warped image on it.

theta = 45*np.pi/180
transf = np.array([
    [ np.cos(theta),np.sin(theta),0],
    [-np.sin(theta),np.cos(theta),0],
    [0.,0.,1.]])
print(transf)
trans_lin_homg_pts = transf.dot(lin_homg_pts)
minX = np.min(trans_lin_homg_pts[0,:])
minY = np.min(trans_lin_homg_pts[1,:])
maxX = np.max(trans_lin_homg_pts[0,:])
maxY = np.max(trans_lin_homg_pts[1,:])
# minX: 0.0, minY: -2.12132034356, maxX: 4.24264068712, maxY: 2.12132034356,

So we see that we can get pixel locations well outside our original image, both in the negative and positive directions. The minimum x value doesn't change because when an homography applies a rotation, it does it from the top-left corner. Now one thing to note here is that I've applied the transformation to all pixels in the image. But this is really unnecessary, you can simply warp the four corner points and see where they land.

Padding the destination image

Note that when you call cv2.warpAffine() you have to input the destination size. These transformed pixel values reference that size. So if a pixel gets mapped to (-10,0), it won't show up in the destination image. That means that we'll have to make another homography with translations which shift all pixel locations be positive, and then we can pad the image matrix to compensate for our shift. We'll also have to pad the original image on the bottom and the right if the homography moves points to positions bigger than the image, too.

In the recent example, the min x value is the same, so we need no horizontal shift. However, the min y value has dropped by about two pixels, so we need to shift the image two pixels down. First, let's create the padded destination image.

pad_sz = list(src.shape) # in case three channel
pad_sz[0] = np.round(np.maximum(pad_sz[0], maxY) - np.minimum(0, minY)).astype(int)
pad_sz[1] = np.round(np.maximum(pad_sz[1], maxX) - np.minimum(0, minX)).astype(int)
dst_pad = np.zeros(pad_sz, dtype=np.uint8)
# pad_sz = [6, 4, 3]

As we can see, the height increased from the original by two pixels to account for that shift.

Add translation to the transformation to shift all pixel locations to positive

Now, we need to create a new homography matrix to translate the warped image by the same amount that we shifted by. And to apply both transformations---the original and this new shift---we have to compose the two homographies (for an affine transformation, you can simply add the translation, but not for an homography). Additionally we need to divide by the last entry to make sure the scales are still proper (again, only for homographies):

anchorX, anchorY = 0, 0
transl_transf = np.eye(3,3)
if minX < 0: 
    anchorX = np.round(-minX).astype(int)
    transl_transf[0,2] -= anchorX
if minY < 0:
    anchorY = np.round(-minY).astype(int)
    transl_transf[1,2] -= anchorY
new_transf = transl_transf.dot(transf)
new_transf /= new_transf[2,2]

I also created here the anchor points for where we will place the destination image into the padded matrix; it's shifted by the same amount the homography will shift the image. So let's place the destination image inside the padded matrix:

dst_pad[anchorY:anchorY+dst_sz[0], anchorX:anchorX+dst_sz[1]] = dst

Warp with the new transformation into the padded image

All we have left to do is apply the new transformation to the source image (with the padded destination size), and then we can overlay the two images.

warped = cv2.warpPerspective(src, new_transf, (pad_sz[1],pad_sz[0]))

alpha = 0.3
beta = 1 - alpha
blended = cv2.addWeighted(warped, alpha, dst_pad, beta, 1.0)

Putting it all together

Let's create a function for this since we were creating quite a few variables we don't need at the end here. For inputs we need the source image, the destination image, and the original homography. And for outputs we simply want the padded destination image, and the warped image. Note that in the examples we used a 3x3 homography so we better make sure we send in 3x3 transforms instead of 2x3 affine or Euclidean warps. You can just add the row [0,0,1] to any affine warp at the bottom and you'll be fine.

def warpPerspectivePadded(img, dst, transf):

    src_h, src_w = src.shape[:2]
    lin_homg_pts = np.array([[0, src_w, src_w, 0], [0, 0, src_h, src_h], [1, 1, 1, 1]])

    trans_lin_homg_pts = transf.dot(lin_homg_pts)
    trans_lin_homg_pts /= trans_lin_homg_pts[2,:]

    minX = np.min(trans_lin_homg_pts[0,:])
    minY = np.min(trans_lin_homg_pts[1,:])
    maxX = np.max(trans_lin_homg_pts[0,:])
    maxY = np.max(trans_lin_homg_pts[1,:])

    # calculate the needed padding and create a blank image to place dst within
    dst_sz = list(dst.shape)
    pad_sz = dst_sz.copy() # to get the same number of channels
    pad_sz[0] = np.round(np.maximum(dst_sz[0], maxY) - np.minimum(0, minY)).astype(int)
    pad_sz[1] = np.round(np.maximum(dst_sz[1], maxX) - np.minimum(0, minX)).astype(int)
    dst_pad = np.zeros(pad_sz, dtype=np.uint8)

    # add translation to the transformation matrix to shift to positive values
    anchorX, anchorY = 0, 0
    transl_transf = np.eye(3,3)
    if minX < 0: 
        anchorX = np.round(-minX).astype(int)
        transl_transf[0,2] += anchorX
    if minY < 0:
        anchorY = np.round(-minY).astype(int)
        transl_transf[1,2] += anchorY
    new_transf = transl_transf.dot(transf)
    new_transf /= new_transf[2,2]

    dst_pad[anchorY:anchorY+dst_sz[0], anchorX:anchorX+dst_sz[1]] = dst

    warped = cv2.warpPerspective(src, new_transf, (pad_sz[1],pad_sz[0]))

    return dst_pad, warped

Example of running the function

Finally, we can call this function with some real images and homographies and see how it pans out. I'll borrow the example from LearnOpenCV:

src = cv2.imread('book2.jpg')
pts_src = np.array([[141, 131], [480, 159], [493, 630],[64, 601]], dtype=np.float32)
dst = cv2.imread('book1.jpg')
pts_dst = np.array([[318, 256],[534, 372],[316, 670],[73, 473]], dtype=np.float32)

transf = cv2.getPerspectiveTransform(pts_src, pts_dst)

dst_pad, warped = warpPerspectivePadded(src, dst, transf)

alpha = 0.5
beta = 1 - alpha
blended = cv2.addWeighted(warped, alpha, dst_pad, beta, 1.0)
cv2.imshow("Blended Warped Image", blended)
cv2.waitKey(0)

And we end up with this padded warped image:

![[Padded and warped1]1

as opposed to the typical cut off warp you would normally get.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1