'pandas, access a series of lists as a set and take the set difference of 2 set series
Given 2 pandas series, both consisting of lists (i.e. each row in the series is a list), I want to take the set difference of 2 columns
For example, in the dataframe...
pd.DataFrame({
'A': [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
'B': [[1, 2], [5, 6], [7, 8, 9]]
})
I want to create a new column C
, that is set(A) - set(B)...
pd.DataFrame({
'C': [[3], [4], []]
})
Solution 1:[1]
Thanks to: https://www.geeksforgeeks.org/python-difference-two-lists/
def Diff(li1, li2):
return list(set(li1) - set(li2)) + list(set(li2) - set(li1))
df['C'] = df.apply(lambda x: Diff(x['A'], x['B']), axis=1)
Output
A B C
0 [1, 2, 3] [1, 2] [3]
1 [4, 5, 6] [5, 6] [4]
2 [7, 8, 9] [7, 8, 9] []
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Jonathan Leon |