'Efficient way to update values in a GeoDataFrame based on the result of DataFrame.within method

I have two large GeoDataFrame:

One came from a shapefile where each polygons has a float value called 'asapp'.

Second are the centroids of a fishnet grid with 3x3 meters and a column 'asapp' zeroed.

What I need is to fill the 'asapp' of the second one base where the centroid is within the polygons of the first.

The code below do this but with a ridiculos low rate of 15 polygons per sec (One of the smallest shapefile has more tha 20000 polygons).

# fishnet_grid is a dict created by GDAL with a raster with 3m pixel size
cells_in_wsg = np.array([(self.__convert_geom_sirgas(geom, ogr_transform), int(fid), 0.0) for fid, geom in fishnet_grid.items()])

# transforming the grid raster (which are square polygons) in a GeoDataframe of point using the centroids of the cells
fishnet_base = gpd.GeoDataFrame({'geometry': cells_in_wsg[..., 0], 'id': cells_in_wsg[..., 1], 'asapp': cells_in_wsg[..., 2]})
fishnet = gpd.GeoDataFrame({'geometry': fishnet_base.centroid, 'id': fishnet_base['id'], 'asapp': fishnet_base['asapp']})

# as_applied_data is the polygons GeoDataFrame
# the code below takes a lot of time to complete
for as_applied in as_applied_data.iterrows():
    fishnet.loc[fishnet.within(as_applied[1]['geometry']), ['asapp']] += [as_applied[1]['asapp']]

There is another way to do this with better performance?

Tys!



Solution 1:[1]

I solved the problem.

I readed about using geopandas.overlay (https://geopandas.org/en/stable/docs/user_guide/set_operations.html) with work with a lot of polygons, but the problem is that it work only with polygons and I had polygons and points.

So, my solution, was to create very small polygons (2cm side squares) from the points and then using the overlay.

The final code:

# fishnet is now a GeoDataFrame of little squares
fishnet = gpd.GeoDataFrame({'geometry': cells_in_wsg[..., 0], 'id': cells_in_wsg[..., 1]})

#intersection has only the little squares that intersects with all as_applied_data polygons and the value in those polygons
intersection = gpd.overlay(fishnet, as_applied_data, how='intersection')

# now this is as easy as to calculate the mean and put it back in the fishnet using the merge
values = fishnet.merge(intersection.groupby(['id'], as_index=False).mean())
#and values has the the little squares, the geom_id and the mean values of the intersections!

Its worked very fine!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Márcio Duarte