'How to read .csv with a compound header into a xarray DataArray (using pandas)
Given a dataset with the following structure:
time var1 var2 var2 var1 var3
loc1 loc1 loc2 loc2 loc1
1 11 12 13 14 15
2 21 22 23 25
3 32 33 34 35
Given as a .csv:
time,var1,var2,var2,var1,var3
,loc1,loc1,loc2,loc2,loc1
1,11,12,13,14,15
2,21,22,23,,25
3,,32,33,34,35
Note: some values are missing, not all variables are available for all locations, timestamps are available for every record, columns may appear out of order, but timestamp is reliably the first column. I'm not sure all these aspects are relevant to an optimal solution, but there they are.
I didn't have too much trouble setting up an xarray three dimensional array that would allow me to access values by timestamp, location, variable name. It was looping through the location names after determining the unique ones, filtering the data by location and adding the results one location at a time. But I am wondering what a pythonic and, for lack of a better word, pandastic solution would look like?
Question: Is there some compact and efficient way, likely using pandas and xarray, to load this dataset or any similar one (with different variable and location names) from .csv into a 3d-array like an xarray DataArray?
Solution 1:[1]
df = pd.read_csv('tst.csv', header=[0, 1], index_col=0).sort_index(1)
time var1 var2 var3
loc1 loc2 loc1 loc2 loc1
1 11.0 14.0 12 13 15
2 21.0 NaN 22 23 25
3 NaN 34.0 32 33 35
However, to get into a 3-D array, we must project this into a cartesian product of the axes available to us.
cols = pd.MultiIndex.from_product(df.columns.levels)
d1 = df.reindex(columns=cols)
d1
var1 var2 var3
loc1 loc2 loc1 loc2 loc1 loc2
1 11.0 14.0 12 13 15 NaN
2 21.0 NaN 22 23 25 NaN
3 NaN 34.0 32 33 35 NaN
Then use numpy.reshape
and numpy.transpose
d1.values.reshape(3, 3, 2).transpose(1, 0, 2)
array([[[ 11., 14.],
[ 21., NaN],
[ NaN, 34.]],
[[ 12., 13.],
[ 22., 23.],
[ 32., 33.]],
[[ 15., NaN],
[ 25., NaN],
[ 35., NaN]]])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Grismar |