'How to convert a nested dict, to a pandas dataframe
I'm trying to convert a dataframe that has inside other dataframe like:
{
'id': 3241234,
'data': {
'name':'carol',
'lastname': 'netflik',
'office': {
'num': 3543,
'department': 'trigy'
}
}
}
I tried to use:
pd.DataFrame.from_dict(data)
But the result dataframe looks like:
id data
lastname 3241234 netflik
name 3241234 carol
office 3241234 {'num': 3543, 'department': 'trigy'}
Any idea?
Solution 1:[1]
Loading a JSON/dict:
- Using
.json_normalized
to expand thedict
.
import pandas as pd
data = {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
df = pd.json_normalize(data)
# display(df)
id data.name data.lastname data.office.num data.office.department
0 3241234 carol netflik 3543 trigy
If the dataframe has column of dicts
- Also see this answer, to this SO: Split / Explode a column of dictionaries into separate columns with pandas
# dataframe with column of dicts
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})
# display(df)
col2 col
0 1 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
1 2 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
2 3 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
# normalize the column of dicts
normalized = pd.json_normalize(df['col'])
# join the normalized column to df
df = df.join(normalized).drop(columns=['col'])
# display(df)
col2 id data.name data.lastname data.office.num data.office.department
0 1 3241234 carol netflik 3543 trigy
1 2 3241234 carol netflik 3543 trigy
2 3 3241234 carol netflik 3543 trigy
If the dataframe has a column of lists
with dicts
- The
dicts
need to be removed from thelists
with.explode
data = [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})
# display(df)
col2 col
0 1 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
1 2 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
2 3 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
# explode the lists
df = df.explode('col').reset_index(drop=True)
# normalize the column of dicts
normalized = pd.json_normalize(df['col'])
# join the normalized column to df
df = df.join(normalized).drop(columns=['col'])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |