'Remove variables from a recarray by variable name

I need to concatenate 2 rec.arrays (same procedure I do for all other in my work). Problem I have is one of the documents I read for the array, has 2 extra variables that I need to remove to match the variables of the other array to concatenate. I have tried several things, like using the index to remove, all lead to error.

This is the array

vswhr1
rec.array([('ny20110325s0a06c.001', 2011.23149798,  84.49677, 11.9223, 1.000e+00, 78.923, 11.923, 0.024, 0.024, 77.286, 189.465  ,  1.688, 180.     , 0.0019, 0., 0.00167, 60., 1003.84003, -15.7, 1003.84003, 65.8, -1., 0.    , -1., -1., 9.8765e+35, 9.8765e+35, 5.96541e+21, 2.60898e+19, 8.45080e+21, 7.92632e+19, 8.74633e+21, 8.68890e+19),
           ('ny20110325s0a06c.002', 2011.23150704,  84.50007, 12.0017, 2.000e+00, 78.923, 11.923, 0.024, 0.024, 77.325, 190.686  ,  1.694, 180.     , 0.0019, 0., 0.00167, 60., 1003.83002, -16. , 1003.83002, 68.7, -1., 0.    , -1., -1., 9.8765e+35, 9.8765e+35, 5.93553e+21, 2.54199e+19, 8.43518e+21, 7.75936e+19, 8.72990e+21, 8.60191e+19),
           ('ny20110325s0a06c.003', 2011.23150736,  84.50019, 12.0045, 3.000e+00, 78.923, 11.923, 0.024, 0.024, 77.326, 190.728  ,  1.694, 180.     , 0.0019, 0., 0.00167, 60., 1003.83002, -16.1, 1003.83002, 68.9, -1., 0.    , -1., -1., 9.8765e+35, 9.8765e+35, 5.93643e+21, 2.59443e+19, 8.42675e+21, 8.17653e+19, 8.73537e+21, 8.68880e+19),
           ...,
           ('ny20180919s0i06c.0042', 2018.71887239, 262.38843,  9.3221, 1.234e+03, 78.923, 11.923, 0.024, 0.027, 78.69 , 152.737  , -1.722, 180.00999, 0.0019, 0., 0.00188, 60., 1011.84003,  -2.2, 1011.84003, 77.6, -1., 0.0125, -1., -1., 9.8765e+35, 9.8765e+35, 2.11077e+22, 8.61874e+19, 8.72151e+21, 5.33405e+19, 9.01945e+21, 7.07619e+19),
           ('ny20180920s0i06c.0491', 2018.72160282, 263.38504,  9.2407, 1.235e+03, 78.923, 11.923, 0.024, 0.034, 79.177, 151.62399, -1.735, 180.00999, 0.0019, 0., 0.00188, 60., 1006.65997,   0. , 1006.65997, 62.8, -1., 0.0095, -1., -1., 9.8765e+35, 9.8765e+35, 1.96888e+22, 7.48627e+19, 8.70719e+21, 5.40175e+19, 8.97596e+21, 7.49834e+19),
           ('ny20180920s0i06c.0492', 2018.72161188, 263.38834,  9.3201, 1.236e+03, 78.923, 11.923, 0.024, 0.034, 79.072, 152.83299, -1.729, 180.00999, 0.0019, 0., 0.00188, 60., 1006.65997,  -0.6, 1006.65997, 64.6, -1., 0.0078, -1., -1., 9.8765e+35, 9.8765e+35, 1.94867e+22, 7.83111e+19, 8.71765e+21, 4.97304e+19, 8.97784e+21, 7.23055e+19)],
          dtype=[('spectrum', '<U21'), ('year', '<f8'), ('day', '<f8'), ('hour', '<f8'), ('run', '<f8'), ('lat', '<f8'), ('long', '<f8'), ('zobs', '<f8'), ('zmin', '<f8'), ('solzen', '<f8'), ('azim', '<f8'), ('osds', '<f8'), ('opd', '<f8'), ('fovi', '<f8'), ('amal', '<f8'), ('graw', '<f8'), ('tins', '<f8'), ('pins', '<f8'), ('tout', '<f8'), ('pout', '<f8'), ('hout', '<f8'), ('sia', '<f8'), ('fvsi', '<f8'), ('wspd', '<f8'), ('wdir', '<f8'), ('luft', '<f8'), ('luft_error', '<f8'), ('h2o', '<f8'), ('h2o_error', '<f8'), ('co2', '<f8'), ('co2_error', '<f8'), ('3co2', '<f8'), ('3co2_error', '<f8')])

vswhr1.shape 
(1236,) 

*irrelevant numbers

I need to delete the las 2 variables ('3co2', '<f8'), ('3co2_error', '<f8')

Thank you



Solution 1:[1]

If you are loading these arrays from csv files, then using usecols to select which columns you load may be the easiest way to get two arrays that match in dtype.

But it is also possible to select a subset of fields from an existing array.

To illustrate:

In [1]: dt1 = np.dtype('U10,i,f')
In [2]: dt2 = np.dtype('U10,i,f,i,i')
In [3]: x = np.ones(2,dtype=dt1)
In [4]: y = np.zeros(2,dtype=dt2)
In [5]: x
Out[5]: 
array([('1', 1, 1.), ('1', 1, 1.)],
      dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f4')])
In [6]: y
Out[6]: 
array([('', 0, 0., 0, 0), ('', 0, 0., 0, 0)],
      dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f4'), ('f3', '<i4'), ('f4', '<i4')])

A subset of the the y fields:

In [7]: y[['f0','f1','f2']]
Out[7]: 
array([('', 0, 0.), ('', 0, 0.)],
      dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})

There are some complications in this view, as evidenced by the offsets parameter in the new dtype. The structured arrays doc page discusses this. Sometimes it's necessary to make a copy using the recfunctions.repack function.

But it appears that the view is just fine when used in concatenate:

In [8]: np.concatenate((x,y[['f0','f1','f2']]))
Out[8]: 
array([('1', 1, 1.), ('1', 1, 1.), ('', 0, 0.), ('', 0, 0.)],
      dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})

We could also get the indexing list from the other array's dtype:

In [9]: x.dtype.names
Out[9]: ('f0', 'f1', 'f2')

That's a tuple, which we need to convert to a list:

In [13]: np.concatenate((x,y[list(x.dtype.names)]))
Out[13]: 
array([('1', 1, 1.), ('1', 1, 1.), ('', 0, 0.), ('', 0, 0.)],
      dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})

(often in Python lists and tuples are interchangeable, but in numpy indexing they are interpreted in different ways, so the distinction is important.)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 hpaulj