'Remove variables from a recarray by variable name
I need to concatenate 2 rec.arrays (same procedure I do for all other in my work). Problem I have is one of the documents I read for the array, has 2 extra variables that I need to remove to match the variables of the other array to concatenate. I have tried several things, like using the index to remove, all lead to error.
This is the array
vswhr1
rec.array([('ny20110325s0a06c.001', 2011.23149798, 84.49677, 11.9223, 1.000e+00, 78.923, 11.923, 0.024, 0.024, 77.286, 189.465 , 1.688, 180. , 0.0019, 0., 0.00167, 60., 1003.84003, -15.7, 1003.84003, 65.8, -1., 0. , -1., -1., 9.8765e+35, 9.8765e+35, 5.96541e+21, 2.60898e+19, 8.45080e+21, 7.92632e+19, 8.74633e+21, 8.68890e+19),
('ny20110325s0a06c.002', 2011.23150704, 84.50007, 12.0017, 2.000e+00, 78.923, 11.923, 0.024, 0.024, 77.325, 190.686 , 1.694, 180. , 0.0019, 0., 0.00167, 60., 1003.83002, -16. , 1003.83002, 68.7, -1., 0. , -1., -1., 9.8765e+35, 9.8765e+35, 5.93553e+21, 2.54199e+19, 8.43518e+21, 7.75936e+19, 8.72990e+21, 8.60191e+19),
('ny20110325s0a06c.003', 2011.23150736, 84.50019, 12.0045, 3.000e+00, 78.923, 11.923, 0.024, 0.024, 77.326, 190.728 , 1.694, 180. , 0.0019, 0., 0.00167, 60., 1003.83002, -16.1, 1003.83002, 68.9, -1., 0. , -1., -1., 9.8765e+35, 9.8765e+35, 5.93643e+21, 2.59443e+19, 8.42675e+21, 8.17653e+19, 8.73537e+21, 8.68880e+19),
...,
('ny20180919s0i06c.0042', 2018.71887239, 262.38843, 9.3221, 1.234e+03, 78.923, 11.923, 0.024, 0.027, 78.69 , 152.737 , -1.722, 180.00999, 0.0019, 0., 0.00188, 60., 1011.84003, -2.2, 1011.84003, 77.6, -1., 0.0125, -1., -1., 9.8765e+35, 9.8765e+35, 2.11077e+22, 8.61874e+19, 8.72151e+21, 5.33405e+19, 9.01945e+21, 7.07619e+19),
('ny20180920s0i06c.0491', 2018.72160282, 263.38504, 9.2407, 1.235e+03, 78.923, 11.923, 0.024, 0.034, 79.177, 151.62399, -1.735, 180.00999, 0.0019, 0., 0.00188, 60., 1006.65997, 0. , 1006.65997, 62.8, -1., 0.0095, -1., -1., 9.8765e+35, 9.8765e+35, 1.96888e+22, 7.48627e+19, 8.70719e+21, 5.40175e+19, 8.97596e+21, 7.49834e+19),
('ny20180920s0i06c.0492', 2018.72161188, 263.38834, 9.3201, 1.236e+03, 78.923, 11.923, 0.024, 0.034, 79.072, 152.83299, -1.729, 180.00999, 0.0019, 0., 0.00188, 60., 1006.65997, -0.6, 1006.65997, 64.6, -1., 0.0078, -1., -1., 9.8765e+35, 9.8765e+35, 1.94867e+22, 7.83111e+19, 8.71765e+21, 4.97304e+19, 8.97784e+21, 7.23055e+19)],
dtype=[('spectrum', '<U21'), ('year', '<f8'), ('day', '<f8'), ('hour', '<f8'), ('run', '<f8'), ('lat', '<f8'), ('long', '<f8'), ('zobs', '<f8'), ('zmin', '<f8'), ('solzen', '<f8'), ('azim', '<f8'), ('osds', '<f8'), ('opd', '<f8'), ('fovi', '<f8'), ('amal', '<f8'), ('graw', '<f8'), ('tins', '<f8'), ('pins', '<f8'), ('tout', '<f8'), ('pout', '<f8'), ('hout', '<f8'), ('sia', '<f8'), ('fvsi', '<f8'), ('wspd', '<f8'), ('wdir', '<f8'), ('luft', '<f8'), ('luft_error', '<f8'), ('h2o', '<f8'), ('h2o_error', '<f8'), ('co2', '<f8'), ('co2_error', '<f8'), ('3co2', '<f8'), ('3co2_error', '<f8')])
vswhr1.shape
(1236,)
*irrelevant numbers
I need to delete the las 2 variables ('3co2', '<f8'), ('3co2_error', '<f8')
Thank you
Solution 1:[1]
If you are loading these arrays from csv files, then using usecols
to select which columns you load may be the easiest way to get two arrays that match in dtype
.
But it is also possible to select a subset of fields from an existing array.
To illustrate:
In [1]: dt1 = np.dtype('U10,i,f')
In [2]: dt2 = np.dtype('U10,i,f,i,i')
In [3]: x = np.ones(2,dtype=dt1)
In [4]: y = np.zeros(2,dtype=dt2)
In [5]: x
Out[5]:
array([('1', 1, 1.), ('1', 1, 1.)],
dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f4')])
In [6]: y
Out[6]:
array([('', 0, 0., 0, 0), ('', 0, 0., 0, 0)],
dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f4'), ('f3', '<i4'), ('f4', '<i4')])
A subset of the the y
fields:
In [7]: y[['f0','f1','f2']]
Out[7]:
array([('', 0, 0.), ('', 0, 0.)],
dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})
There are some complications in this view
, as evidenced by the offsets
parameter in the new dtype. The structured arrays
doc page discusses this. Sometimes it's necessary to make a copy
using the recfunctions.repack
function.
But it appears that the view
is just fine when used in concatenate
:
In [8]: np.concatenate((x,y[['f0','f1','f2']]))
Out[8]:
array([('1', 1, 1.), ('1', 1, 1.), ('', 0, 0.), ('', 0, 0.)],
dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})
We could also get the indexing list from the other array's dtype
:
In [9]: x.dtype.names
Out[9]: ('f0', 'f1', 'f2')
That's a tuple, which we need to convert to a list:
In [13]: np.concatenate((x,y[list(x.dtype.names)]))
Out[13]:
array([('1', 1, 1.), ('1', 1, 1.), ('', 0, 0.), ('', 0, 0.)],
dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})
(often in Python lists and tuples are interchangeable, but in numpy
indexing they are interpreted in different ways, so the distinction is important.)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | hpaulj |