'Spatial with SparkSQL/Python in Synapse Spark Pool using apache-sedona?

I would like to run spatial queries on large data sets; e.g. geopandas would be too slow. Inspiration I found here: https://anant-sharma.medium.com/apache-sedona-geospark-using-pyspark-e60485318fbe

In Spark Pool of Synapse Analytics I prepared (via Azure Portal):

Apache Spark Pool / Settings / Packages / Requirement files:

requirement.txt:

azure-storage-file-share
geopandas
apache-sedona

Apache Spark Pool / Settings / Packages / Workspace packages:

geotools-wrapper-geotools-24.1.jar
sedona-sql-3.0_2.12-1.2.0-incubating.jar

Apache Spark Pool / Settings / Packages / Spark configuration

config.txt:

spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator org.apache.sedona.core.serde.SedonaKryoRegistrator

In Pyspark Notebook

print(spark.version)
print(spark.conf.get("spark.kryo.registrator"))
print(spark.conf.get("spark.serializer"))

The output was:

3.1.2.5.0-58001107
org.apache.sedona.core.serde.SedonaKryoRegistrator
org.apache.spark.serializer.KryoSerializer

Then I tried:

from pyspark.sql import SparkSession
from sedona.register import SedonaRegistrator  
from sedona.utils import SedonaKryoRegistrator, KryoSerializer
spark = SparkSession.builder.master("local[*]").appName("Sedona App").config("spark.serializer", KryoSerializer.getName).config("spark.kryo.registrator", SedonaKryoRegistrator.getName).getOrCreate()
SedonaRegistrator.registerAll(spark)

But it failed: Py4JJavaError: An error occurred while calling o636.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: org.apache.spark.SparkException: Failed to register classes with Kryo

A simple check that stuff is correctly installed would probaly allow this:

%%sql
SELECT ST_Point(0,0);

Please help with getting the spatial functions registered in pyspark running in Synapse notebook!



Solution 1:[1]

As per the repro from my end, I'm able to successfully run the above commands without any issue.

I just installed the requirement.txt file contains apache-sedona and downloaded below two jar files:

Note: config.txt file is not required.

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 CHEEKATLAPRADEEP-MSFT