'Efficient way to update values in a GeoDataFrame based on the result of DataFrame.within method
I have two large GeoDataFrame:
One came from a shapefile where each polygons has a float value called 'asapp'.
Second are the centroids of a fishnet grid with 3x3 meters and a column 'asapp' zeroed.
What I need is to fill the 'asapp' of the second one base where the centroid is within the polygons of the first.
The code below do this but with a ridiculos low rate of 15 polygons per sec (One of the smallest shapefile has more tha 20000 polygons).
# fishnet_grid is a dict created by GDAL with a raster with 3m pixel size
cells_in_wsg = np.array([(self.__convert_geom_sirgas(geom, ogr_transform), int(fid), 0.0) for fid, geom in fishnet_grid.items()])
# transforming the grid raster (which are square polygons) in a GeoDataframe of point using the centroids of the cells
fishnet_base = gpd.GeoDataFrame({'geometry': cells_in_wsg[..., 0], 'id': cells_in_wsg[..., 1], 'asapp': cells_in_wsg[..., 2]})
fishnet = gpd.GeoDataFrame({'geometry': fishnet_base.centroid, 'id': fishnet_base['id'], 'asapp': fishnet_base['asapp']})
# as_applied_data is the polygons GeoDataFrame
# the code below takes a lot of time to complete
for as_applied in as_applied_data.iterrows():
    fishnet.loc[fishnet.within(as_applied[1]['geometry']), ['asapp']] += [as_applied[1]['asapp']]
There is another way to do this with better performance?
Tys!
Solution 1:[1]
I solved the problem.
I readed about using geopandas.overlay (https://geopandas.org/en/stable/docs/user_guide/set_operations.html) with work with a lot of polygons, but the problem is that it work only with polygons and I had polygons and points.
So, my solution, was to create very small polygons (2cm side squares) from the points and then using the overlay.
The final code:
# fishnet is now a GeoDataFrame of little squares
fishnet = gpd.GeoDataFrame({'geometry': cells_in_wsg[..., 0], 'id': cells_in_wsg[..., 1]})
#intersection has only the little squares that intersects with all as_applied_data polygons and the value in those polygons
intersection = gpd.overlay(fishnet, as_applied_data, how='intersection')
# now this is as easy as to calculate the mean and put it back in the fishnet using the merge
values = fishnet.merge(intersection.groupby(['id'], as_index=False).mean())
#and values has the the little squares, the geom_id and the mean values of the intersections!
Its worked very fine!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | Márcio Duarte | 
