'get mean of netcdf file using xarray
I have opened a netcdf file in python using xarray, and the dataset summary looks like this.
Dimensions: (latitude: 721, longitude: 1440, time: 41)
Coordinates:
* longitude (longitude) float32 0.0 0.25 0.5 0.75 ... 359.25 359.5 359.75
* latitude (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
expver int32 1
* time (time) datetime64[ns] 1979-01-01 1980-01-01 ... 2019-01-01
Data variables:
z (time, latitude, longitude) float32 50517.914 ... 49769.473
Attributes:
Conventions: CF-1.6
history: 2020-03-02 12:47:40 GMT by grib_to_netcdf-2.16.0: /opt/ecmw...
I want to get the mean of the values of z along the latitude and longitude dimensions.
I've tried to use this code:
df.mean(axis = 0)
But it's removing the time coordinate, and returning me something like this.
Dimensions: (latitude: 721, longitude: 1440)
Coordinates:
expver int32 1
Dimensions without coordinates: latitude, longitude
Data variables:
z (latitude, longitude) float32 49742.03 49742.03 ... 50306.242
Am I doing something wrong here. Please help me with this.
Solution 1:[1]
You need to specify by dimension (dim
) instead of axis
.
Use df.mean(dim='longitude')
Solution 2:[2]
WARNING!!! The accepted answer will give you the wrong result if you apply it along latitude (which you need to do to fully answer the question), since you need to weight each cell, they are not the same size and get smaller as you move towards the poles in a regular lat-lon grid.
Xarray solution:
Thus to make a weighted mean you need to do construct the weights as per the following code:
import numpy as np
weights = np.cos(np.deg2rad(df.z))
weights.name = "weights"
z_weighted = df.z.weighted(weights)
weighted_mean = z_weighted.mean(("longitude", "latitude"))
See this discussion in the xarray documentation for further details and an example comparison.
The size of the error depends on the region over which you are averaging, and how strong the gradient of the variable is in the latitudinal direction - the larger the region in the latitudinal extent and variable gradient, the worse it is... For a global field of temperature this is the example error from the xarray documentation, well over 5degC! The unweighted answer is colder since the poles are counted equally even though the grid cells are much smaller there.
Alternative CDO solution
By the way, as an aside you can also do this from the command line with cdo like this
cdo fldmean in.nc out.nc
cdo accounts for the grid, so you don't need to worry about the weighting issues. cdo can also be called directly from within python using the CDO package.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | bwc |
Solution 2 |