'get mean of netcdf file using xarray

I have opened a netcdf file in python using xarray, and the dataset summary looks like this.

Dimensions:    (latitude: 721, longitude: 1440, time: 41)
Coordinates:
  * longitude  (longitude) float32 0.0 0.25 0.5 0.75 ... 359.25 359.5 359.75
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
    expver     int32 1
  * time       (time) datetime64[ns] 1979-01-01 1980-01-01 ... 2019-01-01
Data variables:
    z          (time, latitude, longitude) float32 50517.914 ... 49769.473
Attributes:
    Conventions:  CF-1.6
    history:      2020-03-02 12:47:40 GMT by grib_to_netcdf-2.16.0: /opt/ecmw...

I want to get the mean of the values of z along the latitude and longitude dimensions.

I've tried to use this code:

df.mean(axis = 0)

But it's removing the time coordinate, and returning me something like this.

Dimensions:  (latitude: 721, longitude: 1440)
Coordinates:
    expver   int32 1
Dimensions without coordinates: latitude, longitude
Data variables:
    z        (latitude, longitude) float32 49742.03 49742.03 ... 50306.242

Am I doing something wrong here. Please help me with this.



Solution 1:[1]

You need to specify by dimension (dim) instead of axis.

Use df.mean(dim='longitude')

Solution 2:[2]

WARNING!!! The accepted answer will give you the wrong result if you apply it along latitude (which you need to do to fully answer the question), since you need to weight each cell, they are not the same size and get smaller as you move towards the poles in a regular lat-lon grid.

Xarray solution:

Thus to make a weighted mean you need to do construct the weights as per the following code:

import numpy as np
weights = np.cos(np.deg2rad(df.z))
weights.name = "weights"
z_weighted = df.z.weighted(weights)
weighted_mean = z_weighted.mean(("longitude", "latitude"))

See this discussion in the xarray documentation for further details and an example comparison.

The size of the error depends on the region over which you are averaging, and how strong the gradient of the variable is in the latitudinal direction - the larger the region in the latitudinal extent and variable gradient, the worse it is... For a global field of temperature this is the example error from the xarray documentation, well over 5degC! The unweighted answer is colder since the poles are counted equally even though the grid cells are much smaller there.

enter image description here

Alternative CDO solution

By the way, as an aside you can also do this from the command line with cdo like this

cdo fldmean in.nc out.nc 

cdo accounts for the grid, so you don't need to worry about the weighting issues. cdo can also be called directly from within python using the CDO package.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 bwc
Solution 2