'PySpark - Convert a heterogeneous array JSON array to Spark dataframe and flatten it

I have streaming data coming in as JSON array and I want flatten it out as a single row in a Spark dataframe using Python.

Here is how the JSON data looks like:

{

"event": [

{

"name": "QuizAnswer",

"count": 1

}

],

"custom": {

"dimensions": [

{

"title": "Are you:"

},

{

"question_id": "5965"

},

{

"option_id": "19029"

},

{

"option_title": "Non-binary"

},

{

"item": "Non-binary"

},

{

"tab_index": "3"

},

{

"tab_count": "4"

},

{

"tab_initial_index": "4"

},

{

"page": "home"

},

{

"environment": "testing"

},

{

"page_count": "0"

},

{

"widget_version": "2.2.44"

},

{

"session_count": "1"

},

{

"quiz_settings_id": "1020"

},

{

"quiz_session": "6e5a3b5c-9961-4c1b-a2af-3374bbeccede"

},

{

"shopify_customer_id": "noid"

},

{

"cart_token": ""

},

{

"app_version": "2.2.44"

},

{

"shop_name": "safety-valve.myshopify.com"

}

],

"metrics": []

}

}

}


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source