'Updating a NumPy array with another

Seemingly simple question: I have an array with two columns, the first represents an ID and the second a count. I'd like to update it with another, similar array such that

import numpy as np

a = np.array([[1, 2],
              [2, 2],
              [3, 1],
              [4, 5]])

b = np.array([[2, 2],
              [3, 1],
              [4, 0],
              [5, 3]])

a.update(b)  # ????
>>> np.array([[1, 2],
              [2, 4],
              [3, 2],
              [4, 5],
              [5, 3]])

Is there a way to do this with indexing/slicing such that I don't simply have to iterate over each row?



Solution 1:[1]

Generic case

Approach #1: You can use np.add.at to do such an ID-based adding operation like so -

# First column of output array as the union of first columns of a,b              
out_id = np.union1d(a[:,0],b[:,0])

# Initialize second column of output array
out_count = np.zeros_like(out_id)

# Find indices where the first columns of a,b are placed in out_id
_,a_idx = np.where(a[:,None,0]==out_id)
_,b_idx = np.where(b[:,None,0]==out_id)
    
# Place second column of a into out_id & add in second column of b
out_count[a_idx] = a[:,1]
np.add.at(out_count, b_idx,b[:,1])

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

To find a_idx and b_idx, as probably a faster alternative, np.searchsorted could be used like so -

a_idx = np.searchsorted(out_id, a[:,0], side='left')
b_idx = np.searchsorted(out_id, b[:,0], side='left')

Sample input-output :

In [538]: a
Out[538]: 
array([[1, 2],
       [4, 2],
       [3, 1],
       [5, 5]])

In [539]: b
Out[539]: 
array([[3, 7],
       [1, 1],
       [4, 0],
       [2, 3],
       [6, 2]])

In [540]: out
Out[540]: 
array([[1, 3],
       [2, 3],
       [3, 8],
       [4, 2],
       [5, 5],
       [6, 2]])

Approach #2: You can use np.bincount to do the same ID based adding -

# First column of output array as the union of first columns of a,b  
out_id = np.union1d(a[:,0],b[:,0])

# Get all IDs and counts in a single arrays
id_arr = np.concatenate((a[:,0],b[:,0]))
count_arr = np.concatenate((a[:,1],b[:,1]))

# Get binned summations
summed_vals = np.bincount(id_arr,count_arr)

# Get mask of valid bins
mask = np.in1d(np.arange(np.max(out_id)+1),out_id)

# Mask valid summed bins for final counts array output
out_count = summed_vals[mask]

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

Specific case

If the ID columns in a and b are sorted, it becomes easier, as we can just use masks with np.in1d to index into the output ID array created with np.union like so -

# First column of output array as the union of first columns of a,b  
out_id = np.union1d(a[:,0],b[:,0])

# Masks of first columns of a and b matches in the output ID array
mask1 = np.in1d(out_id,a[:,0])
mask2 = np.in1d(out_id,b[:,0])

# Initialize second column of output array
out_count = np.zeros_like(out_id)

# Place second column of a into out_id & add in second column of b
out_count[mask1] = a[:,1]
np.add.at(out_count, np.where(mask2)[0],b[:,1])

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

Sample run -

In [552]: a
Out[552]: 
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 5],
       [8, 5]])

In [553]: b
Out[553]: 
array([[2, 2],
       [3, 1],
       [4, 0],
       [5, 3],
       [6, 2],
       [8, 2]])

In [554]: out
Out[554]: 
array([[1, 2],
       [2, 4],
       [3, 2],
       [4, 5],
       [5, 3],
       [6, 2],
       [8, 7]])

Solution 2:[2]

>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 5],
       [5, 3]])

Note that if you want the result become sorted you can use np.lexsort :

result[np.lexsort((result[:,0],result[:,0]))]

Explanation :

First you can find the unique ids with following command :

>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> col
array([1, 2, 3, 4, 5])

Then find the different between the ids if a and all of ids :

>>> dif=np.setdiff1d(col,a[:,0])
>>> dif
array([5])

Then find the items within b with the ids in diff :

>>> val=b[np.in1d(b[:,0],dif)]
>>> val
array([[5, 3]])

And at last concatenate the result with list a:

>>> np.concatenate((a,val))

consider another example with sorting :

>>> a = np.array([[1, 2],
...               [2, 2],
...               [3, 1],
...               [7, 5]])
>>> 
>>> b = np.array([[2, 2],
...               [3, 1],
...               [4, 0],
...               [5, 3]])
>>> 
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]

>>> result=np.concatenate((a,val))
>>> result[np.lexsort((result[:,0],result[:,0]))]
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 0],
       [5, 3],
       [7, 5]])

Solution 3:[3]

That's an old question but here is a solution with pandas (that could be generalized for other aggregation functions than sum). Also sorting will occur automatically:

import pandas as pd
import numpy as np

a = np.array([[1, 2],
              [2, 2],
              [3, 1],
              [4, 5]])

b = np.array([[2, 2],
              [3, 1],
              [4, 0],
              [5, 3]])

print((pd.DataFrame(a[:, 1], index=a[:, 0])
        .add(pd.DataFrame(b[:, 1], index=b[:, 0]), fill_value=0)
        .astype(int))
        .reset_index()
        .to_numpy())

Output:

[[1 2]
 [2 4]
 [3 2]
 [4 5]
 [5 3]]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2
Solution 3 Tranbi