'Extract a value inside a json column in pandas

I have a json column in a pandas dataframe and I need to create a new column based on a value in the json column.

case#             json_col

123     [{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '', 'due_date_change': None, 'by': {'email': 'test.com', 'type': 'staff', 'id': 23, 'name': 'test.com'}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None},{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '2022-04-18 21:36:00', 'due_date_change': None, 'by': {'type': 'priority', 'name': 'Medium', 'id': 32}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None}]
321     [{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '', 'due_date_change': None, 'by': {'email': 'test.com', 'type': 'staff', 'id': 23, 'name': 'test.com'}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None},{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '2022-04-18 21:36:00', 'due_date_change': None, 'by': {'type': 'priority', 'name': 'High', 'id': 12}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None}]

I need to extract the value from the priority as

case#  Priority
 123    Medium
 321    High

Edit: As there are multiple name field in the json, What is the best way to extract name that has type 'priority'



Solution 1:[1]

You can use pandas build in pd.json_normalize() and limit the column only to the ones that you need

import pandas as pd
j = [{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '', 'due_date_change': None, 'by': {'email': 'test.com', 'type': 'staff', 'id': 23, 'name': 'test.com'}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None},{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '2022-04-18 21:36:00', 'due_date_change': None, 'by': {'type': 'priority', 'name': 'Medium', 'id': 32}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None}]
df = pd.json_normalize(j)
df[['by.name']]

Solution 2:[2]

You can try loop the list and extract name whose type is priority then get first item from the output.

df['Priority'] = df['json_col'].apply(lambda lst: next(iter([d['by']['name'] for d in lst if d['by']['type'] == 'priority'])), None)
print(df)

   case#                                           json_col Priority
0    123  [{'priority_change': None, 'category_change': ...   Medium
1    321  [{'priority_change': None, 'category_change': ...     High

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ArchAngelPwn
Solution 2 Ynjxsjmh