'In Palantir Foundry, how should I get the current SparkSession in a Transform?

I'm writing a Python Transform and need to get the SparkSession so I can construct a DataFrame.

How should I do this?



Solution 1:[1]

You can pass the SparkContext as an argument in the transform, which can then be used to generate the SparkSession.

@transform(
    output=Output('/path/to/first/output/dataset'),
)
def my_compute_function(ctx, output):
    # type: (TransformContext, TransformOutput) -> None

    # In this example, the Spark session is used to create an empty data frame.
    columns = [
        StructField("col_a", StringType(), True)
    ]
    empty_df = ctx.spark_session.createDataFrame([], schema=StructType(columns))

    output.write_dataframe(empty_df)

This example can also be found in the Foundry documentation here: https://www.palantir.com/docs/foundry/transforms-python/transforms-python-api/#transform

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 hjones