'Compute annual mean using x-arrays
I have a python xarray
dataset with time,x,y
for its dimensions and value1
as its variable. I'm trying to compute annual mean of value1
for each x,y
coordinate pair.
I've run into this function while reading the docs:
ds.groupby('time.year').mean()
This seems to compute a single annual mean for all x,y
coordinate pairs in value1
at each given time slice
rather than the annual means of individual x,y
coordinate pairs at each given time slice.
While the code snippet above produces the wrong output, I'm very interested in its oversimplified form. I would really like to figure out the "X-arrays trick" to doing annual mean for a given x,y
coordinate pair rather than hacking it together myself.
Cam someone point me in the right direction? Should I temporarily turn this into a pandas
object?
Solution 1:[1]
To avoid the default of averaging over all dimensions, you simply need to supply the dimension you want to average over explicitly:
ds.groupby('time.year').mean('time')
Solution 2:[2]
Note, that calling ds.groupby('time.year').mean('time')
will be incorrect if you are working with monthly and not daily data. Taking the mean will place equal weight on months of different length, e.g., Feb and July, which is wrong.
Instead use below from NCAR:
def weighted_temporal_mean(ds, var):
"""
weight by days in each month
"""
# Determine the month length
month_length = ds.time.dt.days_in_month
# Calculate the weights
wgts = month_length.groupby("time.year") / month_length.groupby("time.year").sum()
# Make sure the weights in each year add up to 1
np.testing.assert_allclose(wgts.groupby("time.year").sum(xr.ALL_DIMS), 1.0)
# Subset our dataset for our variable
obs = ds[var]
# Setup our masking for nan values
cond = obs.isnull()
ones = xr.where(cond, 0.0, 1.0)
# Calculate the numerator
obs_sum = (obs * wgts).resample(time="AS").sum(dim="time")
# Calculate the denominator
ones_out = (ones * wgts).resample(time="AS").sum(dim="time")
# Return the weighted average
return obs_sum / ones_out
average_weighted_temp = weighted_temporal_mean(ds_first_five_years, 'TEMP')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | shoyer |
Solution 2 | Skrt |