'Calculate centroid of entire GeoDataFrame of points
I would like to import some waypoints/markers from a geojson file. Then determine the centroid of all of the points. My code calculates the centroid of each point not the centroid of all points in the series. How do I calculate the centroid of all points in the series?
import geopandas
filepath = r'Shiloh.json'
gdf = geopandas.read_file(filepath)
xyz = gdf['geometry'].to_crs('epsg:3587')
print(type(xyz))
print(xyz)
# xyz is a geometry containing POINT Z
c = xyz.centroid
# instead of calculating the centroid of the collection of points
# centroid has calculated the centroid of each point.
# i.e. basically the same X and Y data as the POINT Z.
The output from print(type(xyz)) and print(xyz)
<class 'geopandas.geoseries.GeoSeries'>
0 POINT Z (2756810.617 248051.052 0.000)
1 POINT Z (2757659.756 247778.482 0.000)
2 POINT Z (2756907.786 248422.534 0.000)
3 POINT Z (2756265.710 248808.235 0.000)
4 POINT Z (2757719.694 248230.174 0.000)
5 POINT Z (2756260.291 249014.991 0.000)
6 POINT Z (2756274.410 249064.239 0.000)
7 POINT Z (2757586.742 248437.232 0.000)
8 POINT Z (2756404.511 249247.296 0.000)
Name: geometry, dtype: geometry
the variable 'c' reports as (centroid of each point, not the centroid of the 9 POINT Z elements) :
0 POINT (2756810.617 248051.052)
1 POINT (2757659.756 247778.482)
2 POINT (2756907.786 248422.534)
3 POINT (2756265.710 248808.235)
4 POINT (2757719.694 248230.174)
5 POINT (2756260.291 249014.991)
6 POINT (2756274.410 249064.239)
7 POINT (2757586.742 248437.232)
8 POINT (2756404.511 249247.296)
dtype: geometry
Solution 1:[1]
first dissolve the GeoDataFrame to get a single shapely.geometry.MultiPoint
object, then find the centroid:
In [8]: xyz.dissolve().centroid
Out[8]:
0 POINT (2756876.613 248561.582)
dtype: geometry
From the geopandas docs:
dissolve() can be thought of as doing three things:
- it dissolves all the geometries within a given group together into a single geometric feature (using the unary_union method), and
- it aggregates all the rows of data in a group using groupby.aggregate, and
- it combines those two results.
Note that if you have rows with duplicate geometries, a centroid calculated with this method will not appropriately weight the duplicates, as dissolve will first de-duplicate the records before calculating the centroid:
In [9]: gdf = gpd.GeoDataFrame({}, geometry=[
...: shapely.geometry.Point(0, 0),
...: shapely.geometry.Point(1, 1),
...: shapely.geometry.Point(1, 1),
...: ])
In [10]: gdf.dissolve().centroid
Out[10]:
0 POINT (0.50000 0.50000)
dtype: geometry
To accurately calculate the centroid of a collection of points including duplicates, create a shapely.geometry.MultiPoint
collection directly:
In [11]: mp = shapely.geometry.MultiPoint(gdf.geometry)
In [12]: mp.centroid.xy
Out[12]: (array('d', [0.6666666666666666]), array('d', [0.6666666666666666]))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |