'Performance issues when iterating numpy array

I have a 3D array of image such as

[
    [
        [225, 0, 0],
        [225, 225, 0],
        ...
    ],
    [
        [225, 0, 0],
        [225, 225, 0],
        ...
    ],
    ...
]

The size of this array is 500x500x3 which is 750.000 elements. These are simple nested loops to iterate over the array

for row in arr:
    for col in row:
        for elem in col:
            elem = (2 * elem / MAX_COLOR_VAL) - 1

But it takes a lot of time (> 5 min) to iterate.

I'm new in numpy so may be I'm iterating arrays wrong way? How can I optimize these loops?



Solution 1:[1]

Numpy arrays are not designed to do iteration over the elements. Likely it will even be slower than iterating over a Python list, since that will result in a lot of wrapping and unwrapping of elements.

Numpy arrays are designed to do processing in bulk. So for example calculate the elementwise-sum of two 1000×1000 matrices.

If you want to multiply all elements with 2, divide these by MAX_COLOR_VAL and subtract one from these, you can simply construct a new array with:

arr = (2 * arr.astype(float) / MAX_COLOR_VAL) - 1

This will apply this operation to all elements.

Note: note that if you iterate over a numpy array, you do not iterate over the indices, you iterate over the rows itself. So the row in for row in arr will return a 2d array, not the index of a 2d array.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Willem Van Onsem