'How do I find all the polygons of a GeoDataframe that contain any point of another GeoDataframe in GeoPandas?

I have a GeoDataframe of about 3200 polygons, and another GeoDataframe of about 26,000 points. I want to get a third GeoDataframe of only the polygons that contain at least one point. This seems like it should be a simple sjoin, but geopandas.sjoin(polygons, points, predicate='contains') returns a GeoDataframe with more polygons than I started with (and very near the number of input points). Examining this GeoDataframe shows that there seem to be some duplicate polygons, perhaps explaining why I have more polygons than I expected. How do I find only the polygons that contain any point without duplicates?



Solution 1:[1]

The how argument in the sjoin method seems to give the solution to this problem. It allows you to choose on which geodaframe you apply it. Here we want to have only the polygons so we use the indexes of the polygons geodaframe: geopandas.sjoin(polygons, points, how='left', op='contains'). This link in the doc provides more specific information: https://geopandas.org/en/stable/docs/user_guide/mergingdata.html#binary-predicate-joins

Solution 2:[2]

I found a workaround, although I feel like it's not the best solution. My polygons have a unique ID column on which I was able to remove duplicates:

geopandas.sjoin(polygons, points, predicate='contains').drop_duplicates(subset=['UNIQUE_ID'], keep='first')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Nick Silvestri