'Pandas: Calculate Difference between a row and all other rows and create column with the name
We have data as below
Name value1 Value2 finallist
0 cosmos 10 20 [10,20]
1 network 30 40 [30,40]
2 unab 20 40 [20,40]
is there any way to do difference between all the rows
Something final output like
Name value1 Value2 finallist cosmos network unab
0 cosmos 10 20 [10,20] 0 40 30
1 network 30 40 [30,40] 40 0 10
2 unab 20 40 [20,40] 30 10 0
Data has different types of names and each name should be a column
Solution 1:[1]
You want the pairwise absolute difference of the sum of the values for each row. The easiest might be to use the underlying numpy array.
absolute difference of the sum of the "value" columns
# get sum of values per row and convert to numpy array
a = df['value1'].filter(regex='(?i)value').sum(1).to_numpy()
# compute the pairwise difference, create a DataFrame and join
df2 = df.join(pd.DataFrame(abs(a-a[:,None]), columns=df['Name'], index=df.index))
output:
Name value1 Value2 finallist cosmos network unab
0 cosmos 10 20 [10, 20] 0 40 30
1 network 30 40 [30, 40] 40 0 10
2 unab 20 40 [20, 40] 30 10 0
Solution 2:[2]
I'm not quite sure how you wish to calculate the values in the columns so I just took the difference of value1, but it's irrelevant for the problem in this case. I'd do it the following way:
for name in df['name']:
vals = []
for idx, row in df.iterrows():
vals.append(int(df.loc[df['name'] == name].value1) - row.value1)
df[name] = vals
So for each name, for which you want to make a column, you go through and calculate the difference, add it to a list and then use that list as the column values for the new named column.
It's a quick fix, possible that there's a better way, but I hope this helps!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | ozmanda |