'combining 2 netcdf files with xarray along the time dimension

I tried to combine two netcdf files along the time dimension using this command. The first file ran 1 to 10 days and then the next one from 11 to 20 days. Time was recorded as year/month/days in the netcdf files.

ds = xr.open_mfdataset([file1, file2], combine='nested',concat_dim=["time"])

The files have data variables that are 3-d (time, lon, lat) dimensions where lon and lat stay constant in each of the two files. The time array did expand to 20 days but data arrays remained as 10 days in the merged output file after executing the command above.

Using cdo mergetime utility, I was able to overcome the problem by doing

cdo mergetime file1.nc file2.nc mergedfile.nc 

However, I am trying to do this in a script and prefer using xarray over cdo. Any comments would help on why xarray doesn't combine non-time arrays in this scenario?



Solution 1:[1]

You should try different options in mf_dataset. I was able to replicate your problem and solve this by changing some options:

#!/usr/bin/env
import datetime
from netCDF4 import Dataset,date2num,num2date
import numpy as np
import xarray as xr
# ----------------------
nx = ny = 10
ntime = 5;
f_a = 'test_01.nc';
with Dataset(f_a,'w','NETCDF3') as ncout:
    ncout.createDimension('x',nx)
    ncout.createDimension('y',ny)
    ncout.createDimension('time',None)
    # -------------------------------
    xv = ncout.createVariable('x','float32',('x'));xv[:]=np.linspace(0,nx,nx)
    yv = ncout.createVariable('y','float32',('y'));yv[:]=np.linspace(0,ny,ny)
    tv = ncout.createVariable('time','float64',('time'));tv[:] = np.linspace(0,ntime,ntime)*3600;tv.setncattr('units','seconds since 2022-05-12 00:00:00')
    dataout = ncout.createVariable('data_3d','float32',('time','y','x'));dataout[:]= np.random.random((ntime,ny,nx))
    dataout = ncout.createVariable('data_2d','float32',('y','x'));dataout[:]= np.random.random((ny,nx))
# ----------------------------------------------------------------------------------------------------------------------
f_b = 'test_02.nc';
with Dataset(f_b,'w','NETCDF3') as ncout:
    ncout.createDimension('x',nx)
    ncout.createDimension('y',ny)
    ncout.createDimension('time',None)
    # -------------------------------
    xv = ncout.createVariable('x','float32',('x'));xv[:]=np.linspace(0,nx,nx)
    yv = ncout.createVariable('y','float32',('y'));yv[:]=np.linspace(0,ny,ny)
    tv = ncout.createVariable('time','float64',('time'));tv[:] = np.linspace(0,ntime,ntime)*3600;tv.setncattr('units','seconds since 2022-05-13 00:00:00')
    dataout = ncout.createVariable('data_3d','float64',('time','y','x'));dataout[:]= np.random.random((ntime,ny,nx))
    dataout = ncout.createVariable('data_2d','float32',('y','x'));dataout[:]= np.random.random((ny,nx))
# ------------------------------------------------------------------------------------------------------------------------
with xr.open_mfdataset([f_a, f_b], combine='nested',concat_dim=["time"]) as ds:
    ds.to_netcdf('merged_default.nc')
# ------------------------------------------------------------------------------------------------------------------------
with xr.open_mfdataset([f_a, f_b],concat_dim='time', data_vars='minimal',combine='nested',coords='minimal',compat='override') as ds:
    ds.to_netcdf('merged_minimal.nc')

So, the "merged_default.nc" has both the original 2D and 3D variables as 3D variables but in the "merged_minimal.nc" the 2D variable is 2D and 3D variable is 3D.

Solution 2:[2]

You could do this in Python using my package nctoolkit. This uses CDO as a backend, so will do the same thing as your CDO option.

import nctoolkit as nc
ds = nc.open_data([file1, file2])
ds.merge("time")
# convert to xarray object if neede
ds_xr = ds.to_xarray()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 msi_gerva
Solution 2 Robert Wilson