'Spatial with SparkSQL/Python in Synapse Spark Pool using apache-sedona?
I would like to run spatial queries on large data sets; e.g. geopandas would be too slow. Inspiration I found here: https://anant-sharma.medium.com/apache-sedona-geospark-using-pyspark-e60485318fbe
In Spark Pool of Synapse Analytics I prepared (via Azure Portal):
Apache Spark Pool / Settings / Packages / Requirement files:
requirement.txt:
azure-storage-file-share
geopandas
apache-sedona
Apache Spark Pool / Settings / Packages / Workspace packages:
geotools-wrapper-geotools-24.1.jar
sedona-sql-3.0_2.12-1.2.0-incubating.jar
Apache Spark Pool / Settings / Packages / Spark configuration
config.txt:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator org.apache.sedona.core.serde.SedonaKryoRegistrator
In Pyspark Notebook
print(spark.version)
print(spark.conf.get("spark.kryo.registrator"))
print(spark.conf.get("spark.serializer"))
The output was:
3.1.2.5.0-58001107
org.apache.sedona.core.serde.SedonaKryoRegistrator
org.apache.spark.serializer.KryoSerializer
Then I tried:
from pyspark.sql import SparkSession
from sedona.register import SedonaRegistrator
from sedona.utils import SedonaKryoRegistrator, KryoSerializer
spark = SparkSession.builder.master("local[*]").appName("Sedona App").config("spark.serializer", KryoSerializer.getName).config("spark.kryo.registrator", SedonaKryoRegistrator.getName).getOrCreate()
SedonaRegistrator.registerAll(spark)
But it failed: Py4JJavaError: An error occurred while calling o636.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: org.apache.spark.SparkException: Failed to register classes with Kryo
A simple check that stuff is correctly installed would probaly allow this:
%%sql
SELECT ST_Point(0,0);
Please help with getting the spatial functions registered in pyspark running in Synapse notebook!
Solution 1:[1]
As per the repro from my end, I'm able to successfully run the above commands without any issue.
I just installed the requirement.txt
file contains apache-sedona
and downloaded below two jar files:
Note: config.txt
file is not required.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | CHEEKATLAPRADEEP-MSFT |