'Spatial database architecture with Apache Parquet, PostgresSQL and PostGIS on on-premises bare-metal S3/MinIo cluster

Designing storage architecture for Petabyte-scale geospatial data; starting from scratch. Creating a MinIo cluster to store the objects in S3 buckets. To store the metadata, I’m considering the Apache Parquet format managed by PostgresSQL, extended by PostGIS. Using Parquet implies resourcing to a PostgresSQL FDW (foreign data wrapper) - ParquetS3. No doubts regarding Parquet’s main advantages when compared to "pure" PostgresSQL: higher query speed, higher compression rate, and the ability to store in S3, which for us is a significant advantage. However, I cannot find documentation or a use case of Parquet with geospatial data.

Should I worry about losing geospatial query features (the PostGIS support) due to the use of Parquet or it’s wrapper for PostgreSQL - ParquetS3?

According to PostgreSQL’s documentation, the wrapper ParquetS3 is valid for PostgreSQL, but I see no mention to the extension PostGIS. On this same PostgreSQL FDW list, there are some geo data wrappers, but these aren’t meant to use use S3/MinIo. According to PostGIS documentation, PostGIS ships with two FWD (Oracle FDW and OGR FDW), but this is a post from 2014.

The code repository for OGR FDW reads:

OGR is the vector half of the GDAL spatial data access library. It allows access to a large number of GIS data formats using a simple C API for data reading and writing. Since OGR exposes a simple table structure and PostgreSQL foreign data wrappers allow access to table structures, the fit seems pretty perfect.

This is followed by a list of limitations of the implementation.

Does the OGR FDW work below the ParquetS3 one, despite the limitations listed? Does anyone know of a FDW for geo data (PostGIS), that also allows sourcing from MinIo S3 buckets? With Parquet or equivalent? Or has anyone tested the configuration above?

I always have the alternative of creating the data store of the PostgreSQL/PostGIS outside of S3, but I’d rather have it in the MinIo cluster.

Thanks.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source