'How to append a tuple to a numpy array without it being preformed element-wise?

If I try

x = np.append(x, (2,3))

the tuple (2,3) does not get appended to the end of the array, rather 2 and 3 get appended individually, even if I originally declared x as

x = np.array([], dtype = tuple)

or

x = np.array([], dtype = (int,2))

What is the proper way to do this?



Solution 1:[1]

I agree with @user2357112 comment:

appending to NumPy arrays is catastrophically slower than appending to ordinary lists. It's an operation that they are not at all designed for

Here's a little benchmark:

# measure execution time
import timeit
import numpy as np


def f1(num_iterations):
    x = np.dtype((np.int32, (2, 1)))

    for i in range(num_iterations):
        x = np.append(x, (i, i))


def f2(num_iterations):
    x = np.array([(0, 0)])

    for i in range(num_iterations):
        x = np.vstack((x, (i, i)))


def f3(num_iterations):
    x = []
    for i in range(num_iterations):
        x.append((i, i))

    x = np.array(x)

N = 50000

print timeit.timeit('f1(N)', setup='from __main__ import f1, N', number=1)
print timeit.timeit('f2(N)', setup='from __main__ import f2, N', number=1)
print timeit.timeit('f3(N)', setup='from __main__ import f3, N', number=1)

I wouldn't use neither np.append nor vstack, I'd just create my python array properly and then use it to construct the np.array

EDIT

Here's the benchmark output on my laptop:

  • append: 12.4983000173
  • vstack: 1.60663705793
  • list: 0.0252208517006

[Finished in 14.3s]

Solution 2:[2]

You need to supply the shape to numpy dtype, like so:

x = np.dtype((np.int32, (1,2))) 
x = np.append(x,(2,3))

Outputs

array([dtype(('<i4', (2, 3))), 1, 2], dtype=object)

[Reference][1]http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html

Solution 3:[3]

If I understand what you mean, you can use vstack:

>>> a = np.array([(1,2),(3,4)])
>>> a = np.vstack((a, (4,5)))
>>> a
array([[1, 2],
       [3, 4],
       [4, 5]])

Solution 4:[4]

I do not have any special insight as to why this works, but:

x = np.array([1, 3, 2, (5,7), 4])
mytuple = [(2, 3)]
mytuplearray = np.empty(len(mytuple), dtype=object)
mytuplearray[:] = mytuple
y = np.append(x, mytuplearray)
print(y)                                      #   [1 3 2 (5, 7) 4 (2, 3)]

As others have correctly pointed out, this is a slow operation with numpy arrays. If you're just building some code from scratch, try to use some other data type. But if you know your array will always remain small or you're not going to append much or if you have existing code that you need to tweak quickly, then go ahead.

Solution 5:[5]

simplest way:

x=np.append(x,None)
x[-1]=(2,3)

Solution 6:[6]

np.append is easy to use with a case like:

In [94]: np.append([1,2,3],4)
Out[94]: array([1, 2, 3, 4])

but its first example is harder to understand. It shows the same sort of flat concatenate that bothers you:

>>> np.append([1, 2, 3], [[4, 5, 6], [7, 8, 9]])
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

Stripped of dimensional tests, np.append does

In [166]: np.append(np.array([1,2],int),(2,3))
Out[166]: array([1, 2, 2, 3])

In [167]: np.concatenate([np.array([1,2],int),np.array((2,3))])
Out[167]: array([1, 2, 2, 3])

So except for the simplest cases you need to understand what np.array((2,3)) does, and how concatenate handles dimensions.

So apart from the speed issues, np.append can be trickier to use that the interface suggests. The parallels to list append are only superficial.

As for append (or concatenate) with dtype=object (not dtype=tuple) or a compound dtype ('i,i'), I couldn't tell you what happens without testing. At a minimum the inputs should already be arrays, and should have a matching dtype. Otherwise the results can unpredicatable.

edit

Don't trust the timings in https://stackoverflow.com/a/38985245/901925. The functions don't produce the same things.

Corrected functions:

In [233]: def g1(num_iterations):
     ...:     x = np.ones((0,2),int)
     ...:     for i in range(num_iterations):
     ...:         x = np.append(x, [(i, i)], axis=0)
     ...:     return x
     ...: 
     ...: def g2(num_iterations):
     ...:     x = np.ones((0, 2),int)
     ...:     for i in range(num_iterations):
     ...:         x = np.vstack((x, (i, i)))
     ...:     return x
     ...: 
     ...: def g3(num_iterations):
     ...:     x = []
     ...:     for i in range(num_iterations):
     ...:         x.append((i, i))
     ...:     x = np.array(x)
     ...:     return x
     ...: 
In [234]: g1(3)
Out[234]: 
array([[0, 0],
       [1, 1],
       [2, 2]])
In [235]: g2(3)
Out[235]: 
array([[0, 0],
       [1, 1],
       [2, 2]])
In [236]: g3(3)
Out[236]: 
array([[0, 0],
       [1, 1],
       [2, 2]])

np.append and np.vstack timings are much closer. Both use np.concatenate to do the actual joining. They differ in how the inputs are processed prior to sending them to concatenate.

In [237]: timeit g1(1000)
9.69 ms ± 6.25 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [238]: timeit g2(1000)
12.8 ms ± 7.53 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [239]: timeit g3(1000)
537 µs ± 2.22 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

The wrong results. Note that f1 produces a 1d object dtype array, because the starting value is object dtype array, and there's not axis parameter. f2 duplicates the starting array.

In [240]: f1(3)
Out[240]: array([dtype(('<i4', (2, 1))), 0, 0, 1, 1, 2, 2], dtype=object)
In [241]: f2(3)
Out[241]: 
array([[0, 0],
       [0, 0],
       [1, 1],
       [2, 2]])

Not only is it slower to use np.append or np.vstack in a loop, it is also hard to do it right.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Rahul Madhavan
Solution 3
Solution 4 Jonni Lehtiranta
Solution 5 Ali
Solution 6