'selecting an xarray dataset by coordinates instead of dimensions (or switching coordinates to dimensions)

I have a, xarray dataset that is ocean color of the Atlantic Ocean called olci_ds. This dataset has dimensions of rows and columns and coordinates of lat, lon as in: enter image description here

I want to grab a transect across this dataset going from a certain start lat and lon to a certain end lat and lon. For example something like:

lon_list = np.arange(-77.2, -75.5, 0.002)
lat_list = np.arange(34.48, 33.25, -0.002)

My primary challenge is that because this dataset has dimensions of (row,column) instead of (lat,lon). I think ideally I could just make these coordinates dimensions then my problem would be solved but I'm not sure how to do that. I have this working, but it is through a very slow process where for each point in the transect line I find the row and column of the lat and lon I'm interested in and then find the data at that row and column by this function:

def grab_transect_data(lon_list, lat_list, olci_ds):
    variable_list = []
    xloc_list = []
    yloc_list = []

    ratio = len(lat_list)/len(lon_list)

    for idx in range(len(lon_list)):
    # for idx in range(300):

        lat = y_list[math.floor(idx*ratio)]
        lon = lon_list[idx]

        # First, find the index of the grid point nearest a specific lat/lon.   
        abslat = np.abs(olci_ds.lat-lat)
        abslon = np.abs(olci_ds.lon-lon)
        c = np.maximum(abslon, abslat)

        try:
            ([yloc], [xloc]) = np.where(c == np.min(c))

        except ValueError:
            # sometimes there are two equally near and I just grab the first one
            yloc = np.where(c == np.min(c))[0][0]
            xloc = np.where(c == np.min(c))[1][0]

        xloc_list.append(xloc)
        yloc_list.append(yloc)


        if idx % 50 == 0:
            print(idx)

        # Now I can use that index location to get the values at the x/y diminsion
        point_ds = olci_ds.sel(columns=xloc, rows=yloc)
        variable_list.append(point_ds['variable I want'].values)

    return(xloc_list, yloc_list, variable_list)

I have to imagine it would be much simpler just to change these coordinates to dimensions then I can use a normal .sel() something like:

olci_ds.sel(latitude=lat_list, longitude=lon_list, method="nearest")

But I haven't been able to successfully make this conversion. Any help would be much appreciated!



Solution 1:[1]

I believe this should answer your question: https://gis.stackexchange.com/questions/353698/how-to-clip-an-xarray-to-a-smaller-extent-given-the-lat-lon-coordinates Basically you can create a mask based on the (non-dimension) coordinates and then use xarray.Dataset.where() with drop=True in order to create the subset

If the rows and columns correspond to latitude and longitude, it may be possible to use xarray.Dataset.swap_dims() https://xarray.pydata.org/en/stable/generated/xarray.Dataset.swap_dims.html but from what you show here, it looks like the longitude (and latitude) depends on both the row and column, so that will not work

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Foreman