'Python pandas dataframe populate hierarchical levels from parent child

I have the following dataframe which contains Parent child relation:

data = pd.DataFrame({'Parent':['a','a','b','c','c','f','q','z','k'],
                      Child':['b','c','d','f','g','h','k','q','w']})
a
├── b
│   └── d
└── c
    ├── f
    │   └── h
    └── g
z
└── q
    └── k
        └── w

I would like to get a new dataframe which contains e.g. all children of parent a:

child level1 level2 level x
d a b -
b a - -
c a - -
f a c -
h a c f
g a c -

I do not know how many levels there are upfront therefore I have used 'level x'.

I guess I somehow need a recursive pattern iterate over the dataframe.



Solution 1:[1]

I'd suggest

  • building each children:parentList
  • build the DataFrame with giving each parent a level name
import pandas as pd

values = {'Parent': ['a', 'a', 'b', 'c', 'c', 'f', 'q', 'z', 'k'],
          'Child': ['b', 'c', 'd', 'f', 'g', 'h', 'k', 'q', 'w']}

relations = dict(zip(values['Child'], values['Parent']))

def get_parent_list(element):
    parent = relations.get(element)
    return get_parent_list(parent) + [parent] if parent else []

all_relations = {
    children: {f'level_{idx}': value for idx, value in enumerate(get_parent_list(children))}
    for children in set(values['Child'])
}

df = pd.DataFrame.from_dict(all_relations, orient='index')
print(df)


  level_0 level_1 level_2
b       a     NaN     NaN
f       a       c     NaN
d       a       b     NaN
g       a       c     NaN
h       a       c       f
q       z     NaN     NaN
k       z       q     NaN
w       z       q       k
c       a     NaN     NaN

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 azro