'Calculate centroid of entire GeoDataFrame of points

I would like to import some waypoints/markers from a geojson file. Then determine the centroid of all of the points. My code calculates the centroid of each point not the centroid of all points in the series. How do I calculate the centroid of all points in the series?

import geopandas

filepath = r'Shiloh.json'

gdf = geopandas.read_file(filepath)

xyz = gdf['geometry'].to_crs('epsg:3587')

print(type(xyz))
print(xyz)

# xyz is a geometry containing POINT Z

c = xyz.centroid


# instead of calculating the centroid of the collection of points
# centroid has calculated the centroid of each point. 
# i.e. basically the same X and Y data as the POINT Z.

The output from print(type(xyz)) and print(xyz)

<class 'geopandas.geoseries.GeoSeries'>
0    POINT Z (2756810.617 248051.052 0.000)
1    POINT Z (2757659.756 247778.482 0.000)
2    POINT Z (2756907.786 248422.534 0.000)
3    POINT Z (2756265.710 248808.235 0.000)
4    POINT Z (2757719.694 248230.174 0.000)
5    POINT Z (2756260.291 249014.991 0.000)
6    POINT Z (2756274.410 249064.239 0.000)
7    POINT Z (2757586.742 248437.232 0.000)
8    POINT Z (2756404.511 249247.296 0.000)
Name: geometry, dtype: geometry

the variable 'c' reports as (centroid of each point, not the centroid of the 9 POINT Z elements) :

0    POINT (2756810.617 248051.052)
1    POINT (2757659.756 247778.482)
2    POINT (2756907.786 248422.534)
3    POINT (2756265.710 248808.235)
4    POINT (2757719.694 248230.174)
5    POINT (2756260.291 249014.991)
6    POINT (2756274.410 249064.239)
7    POINT (2757586.742 248437.232)
8    POINT (2756404.511 249247.296)
dtype: geometry


Solution 1:[1]

first dissolve the GeoDataFrame to get a single shapely.geometry.MultiPoint object, then find the centroid:

In [8]: xyz.dissolve().centroid
Out[8]:
0    POINT (2756876.613 248561.582)
dtype: geometry

From the geopandas docs:

dissolve() can be thought of as doing three things:

  • it dissolves all the geometries within a given group together into a single geometric feature (using the unary_union method), and
  • it aggregates all the rows of data in a group using groupby.aggregate, and
  • it combines those two results.

Note that if you have rows with duplicate geometries, a centroid calculated with this method will not appropriately weight the duplicates, as dissolve will first de-duplicate the records before calculating the centroid:


In [9]: gdf = gpd.GeoDataFrame({}, geometry=[
   ...:     shapely.geometry.Point(0, 0),
   ...:     shapely.geometry.Point(1, 1),
   ...:     shapely.geometry.Point(1, 1),
   ...:     ])

In [10]: gdf.dissolve().centroid
Out[10]:
0    POINT (0.50000 0.50000)
dtype: geometry

To accurately calculate the centroid of a collection of points including duplicates, create a shapely.geometry.MultiPoint collection directly:

In [11]: mp = shapely.geometry.MultiPoint(gdf.geometry)

In [12]: mp.centroid.xy
Out[12]: (array('d', [0.6666666666666666]), array('d', [0.6666666666666666]))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1