'How to create a new column with a null value using Pyspark DataFrame?

I'm having issues with using pyspark dataframes. I have a column called eventkey which is a concatenation of the following elements: account_type, counter_type and billable_item_sid. I have a function called apply_event_key_transform in which I want to break up the concatenated eventkey and create new columns for each of the elements.

def apply_event_key_transform(data_frame: DataFrame):

    output_df = data_frame.withColumn("account_type", getAccountTypeUDF(data_frame.eventkey)) \
        .withColumn("counter_type", getCounterTypeUDF(data_frame.eventkey)) \
        .withColumn("billable_item_sid", getBiSidUDF(data_frame.eventkey))
    output_df.drop("eventkey")
    return output_df

I've created UDF functions to retrieve the account_type, counter_type and billable_item_sid from a given eventkey value. I have a class called EventKey that takes the full eventkey string as a constructor param, and creates an object with data members to access the account_type, counter_type and billable_item_sid.

getAccountTypeUDF = udf(lambda x: get_account_type(x))
getCounterTypeUDF = udf(lambda x: get_counter_type(x))
getBiSidUDF = udf(lambda x: get_billable_item_sid(x))


def get_account_type(event_key: str):
    event_key_obj = EventKey(event_key)
    return event_key_obj.account_type.name


def get_counter_type(event_key: str):
    event_key_obj = EventKey(event_key)
    return event_key_obj.counter_type


def get_billable_item_sid(event_key: str):
    event_key_obj = EventKey(event_key)
    return event_key_obj.billable_item_sid

The issue that I'm running into is that a billable_item_sid can be null, but when I attempt to call withColumn with a None, the entire frame drops the column when I attempt to aggregate the data later. Is there a way to create a new column with a Null value using withColumn and a UDF?

Things I've tried (for testing purposes):

  1. .withColumn("billable_item_sid", lit(getBiSidUDF(data_frame.eventkey)))
  2. .withColumn("billable_item_sid", lit(None).castString())
  3. Tried a when/otherwise condition for billable_item_sid for null checking


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source