'In Palantir Foundry, how should I get the current SparkSession in a Transform?
I'm writing a Python Transform and need to get the SparkSession so I can construct a DataFrame.
How should I do this?
Solution 1:[1]
You can pass the SparkContext as an argument in the transform, which can then be used to generate the SparkSession.
@transform(
output=Output('/path/to/first/output/dataset'),
)
def my_compute_function(ctx, output):
# type: (TransformContext, TransformOutput) -> None
# In this example, the Spark session is used to create an empty data frame.
columns = [
StructField("col_a", StringType(), True)
]
empty_df = ctx.spark_session.createDataFrame([], schema=StructType(columns))
output.write_dataframe(empty_df)
This example can also be found in the Foundry documentation here: https://www.palantir.com/docs/foundry/transforms-python/transforms-python-api/#transform
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | hjones |