'Python sum of values in dataset
I have this dataframe (ID is a string and Value a float):
ID Value
1 0.0
1.1 0.0
1.2 0.0
1.2.1 27508.42
1.2.2 25861.82
1.3 0.0
1.3.1 0.0
1.3.1.1 0.0
1.3.1.2 0.0
1.3.1.3 30396.25
Whose structure works like this:
1
├── 1.1
├── 1.2
│ ├── 1.2.1
│ └── 1.2.2
└── 1.3
└── 1.3.1
├── 1.3.1.1
├── 1.3.1.2
└── 1.3.1.3
And need for the value of the 'parent' node to be the sum of the leaves. So:
ID Value
1 83766.489 (1.1 + 1.2 + 1.3)
1.1 0.0
1.2 53370.24 (1.2.1 + 1.2.2)
1.2.1 27508.42
1.2.2 25861.82
1.3 30396.25 (1.3.1)
1.3.1 30396.25 (1.3.1.1 + 1.3.1.2 + 1.3.1.3)
1.3.1.1 0.0
1.3.1.2 0.0
1.3.1.3 30396.25
How can I group the IDs? Using groupby wont work since all the IDs are unique. Should I change the structure of the dataframe to better reflect the logic of the schema?
Solution 1:[1]
Another solution (assuming column ID
is sorted):
def counter(x):
out = []
for id_, v in zip(x.index, x):
s = sum(
v
for a, v in out
if a.startswith(id_) and id_.count(".") == a.count(".") - 1
)
out.append((id_, s + v))
return [v for _, v in out]
print(df.set_index("ID")[::-1].apply(counter)[::-1].reset_index())
Prints:
ID Value
0 1 83766.49
1 1.1 0.00
2 1.2 53370.24
3 1.2.1 27508.42
4 1.2.2 25861.82
5 1.3 30396.25
6 1.3.1 30396.25
7 1.3.1.1 0.00
8 1.3.1.2 0.00
9 1.3.1.3 30396.25
Solution 2:[2]
You could find which IDs make up the "child ID" of each ID and then sum across these "child ID"s
from itertools import tee
from collections import defaultdict
d = defaultdict(list)
a, b = tee(df['ID'].values)
b = list(b)
for a_val in a:
for b_val in b:
if b_val.startswith(a_val):
d[a_val].append(b_val)
d
for b_val in b:
df.loc[df['ID'] == b_val, 'total'] = sum(df.loc[df['ID'].isin(d[b_val]), 'Value'])
print(df)
ID Value total
0 1 0.00 83766.49
1 1.1 0.00 0.00
2 1.2 0.00 53370.24
3 1.2.1 27508.42 27508.42
4 1.2.2 25861.82 25861.82
5 1.3 0.00 30396.25
6 1.3.1 0.00 30396.25
7 1.3.1.1 0.00 0.00
8 1.3.1.2 0.00 0.00
9 1.3.1.3 30396.25 30396.25
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Andrej Kesely |
Solution 2 | Mortz |