'NumPy: get min/max from record array of numeric values

I have a NumPy record array of floats:

import numpy as np
ar = np.array([(238.03, 238.0, 237.0),
               (238.02, 238.0, 237.01),
               (238.05, 238.01, 237.0)], 
              dtype=[('A', 'f'), ('B', 'f'), ('C', 'f')])

How can I determine min/max from this record array? My usual attempt of ar.min() fails with:

TypeError: cannot perform reduce with flexible type

I'm not sure how to flatten the values out into a simpler NumPy array.



Solution 1:[1]

The easiest and most efficient way is probably to view your array as a simple 2D array of floats:

ar_view = ar.view((ar.dtype[0], len(ar.dtype.names)))

which is a 2D array view on the structured array:

print ar_view.min(axis=0)  # Or whatever…

This method is fast, as no new array is created (changes to ar_view result in changes to ar). It is restricted to cases like yours, though, where all record fields have the same type (float32, here).

One advantage is that this method keeps the 2D structure of the original array intact: you can find the minimum in each "column" (axis=0), for instance.

Solution 2:[2]

you can do

# construct flattened ndarray
arnew = np.hstack(ar[r] for r in ar.dtype.names)

to flatten the recarray, then you can perform your normal ndarray operations, like

armin, armax = np.min(arnew), np.max(arnew)
print(armin),
print(armax)

the results are

237.0 238.05

basically ar.dtype.names gives you the list of recarray names, then you retrieve the array one by one from the names and stack to arnew

Solution 3:[3]

This may help someone else down the line, but another way to do it that may be more sensible:

import numpy as np
ar = np.array([(238.03, 238.0, 237.0),
              (238.02, 238.0, 237.01),
              (238.05, 238.01, 237.0)], 
              dtype=[('A', 'f'), ('B', 'f'), ('C', 'f')])
arView = ar.view(np.recarray)
arView.A.min()

which allowed me to just pick and choose. A problem on my end was that the dtype for all my elements were not the same (a rather complicated struct by and large).

Solution 4:[4]

A modern approach could leverage pandas to read and process the record array, then convert back to NumPy:

import pandas as pd

# read record array as a data frame, process data
df = pd.DataFrame(ar)
df_min = df.min(axis=0)

# convert to a uniform array
df_min.to_numpy()
# array([238.02, 238.  , 237.  ], dtype=float32)

# convert to a record array
df_min.to_frame().T.to_records(index=False)
# rec.array([(238.02, 238., 237.)],
#           dtype=[('A', '<f4'), ('B', '<f4'), ('C', '<f4')])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2
Solution 3 kratsg
Solution 4 Mike T