'How to create a new column with a null value using Pyspark DataFrame?
I'm having issues with using pyspark dataframes. I have a column called eventkey which is a concatenation of the following elements: account_type, counter_type and billable_item_sid. I have a function called apply_event_key_transform in which I want to break up the concatenated eventkey and create new columns for each of the elements.
def apply_event_key_transform(data_frame: DataFrame):
output_df = data_frame.withColumn("account_type", getAccountTypeUDF(data_frame.eventkey)) \
.withColumn("counter_type", getCounterTypeUDF(data_frame.eventkey)) \
.withColumn("billable_item_sid", getBiSidUDF(data_frame.eventkey))
output_df.drop("eventkey")
return output_df
I've created UDF functions to retrieve the account_type, counter_type and billable_item_sid from a given eventkey value. I have a class called EventKey that takes the full eventkey string as a constructor param, and creates an object with data members to access the account_type, counter_type and billable_item_sid.
getAccountTypeUDF = udf(lambda x: get_account_type(x))
getCounterTypeUDF = udf(lambda x: get_counter_type(x))
getBiSidUDF = udf(lambda x: get_billable_item_sid(x))
def get_account_type(event_key: str):
event_key_obj = EventKey(event_key)
return event_key_obj.account_type.name
def get_counter_type(event_key: str):
event_key_obj = EventKey(event_key)
return event_key_obj.counter_type
def get_billable_item_sid(event_key: str):
event_key_obj = EventKey(event_key)
return event_key_obj.billable_item_sid
The issue that I'm running into is that a billable_item_sid can be null, but when I attempt to call withColumn with a None, the entire frame drops the column when I attempt to aggregate the data later. Is there a way to create a new column with a Null value using withColumn and a UDF?
Things I've tried (for testing purposes):
- .withColumn("billable_item_sid", lit(getBiSidUDF(data_frame.eventkey)))
- .withColumn("billable_item_sid", lit(None).castString())
- Tried a when/otherwise condition for billable_item_sid for null checking
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|