'Comparing two panda dataframes with different size
I want to compare two dataframes with content of 1s and 0s. I run for loops to check every element of the dataframes and at the end, I want to replace the "1" values in dataframe out
that are equal with the dataframe df
with the letter d
and the values that are not equal between the dataframes with the letter i
in the dataframe out
. This code is too slow and I need some input to make it efficient and faster; does anyone have any idea? Also the df
dataframe is 420x420 and the out
410x410
a1=out.columns.values
a2=df.columns.values
b1=out.index.values
b2=df.index.values
for a in a1:
for b in b1:
for c in a2:
for d in b2:
if a == c and b == d:
if out.loc[b,a] == 1 and df.loc[d,c]==1:
out.loc[b,a] = "d"
elif out.loc[b,a] != df.loc[d,c]:
out.loc[d,c] = "i"
else:
pass
A small example for better understanding:
Dataframe df
1 | 2 | 3 | 4 |
---|---|---|---|
1 | 0 | 1 | 1 |
2 | 1 | 0 | 0 |
3 | 1 | 0 | 0 |
4 | 0 | 0 | 0 |
Dataframe out
1 | 2 | 3 | 4 |
---|---|---|---|
1 | 0 | 1 | 1 |
2 | 1 | 0 | 1 |
3 | 1 | 1 | 0 |
4 | 0 | 0 | 0 |
And the resulted dataframe out should be like that:
1 | 2 | 3 | 4 |
---|---|---|---|
1 | 0 | d | d |
2 | d | 0 | i |
3 | d | i | 0 |
4 | 0 | 0 | 0 |
Solution 1:[1]
I created your dataframes like theese:
# df creation
data1 = [
[1, 0, 1, 1],
[2, 1, 0, 0],
[3, 1, 0, 0],
[4, 0, 0, 0]
]
df = pd.DataFrame(data1, columns=[1, 2, 3, 4])
1 | 2 | 3 | 4 |
---|---|---|---|
1 | 0 | 1 | 1 |
2 | 1 | 0 | 0 |
3 | 1 | 0 | 0 |
4 | 0 | 0 | 0 |
# df_out creation
data2 = [
[1, 0, 1, 1],
[2, 1, 0, 1],
[3, 1, 1, 0],
[4, 0, 0, 0]
]
df_out = pd.DataFrame(data2, columns=[1, 2, 3, 4])
1 | 2 | 3 | 4 |
---|---|---|---|
1 | 0 | 1 | 1 |
2 | 1 | 0 | 1 |
3 | 1 | 1 | 0 |
4 | 0 | 0 | 0 |
# Then I used 'np.where' method on all intersected columns.
intersected_columns = set(df.columns).intersection(df_out.columns)
for col in intersected_columns:
if col != 1: # I think first column is the index
df_out[col] = np.where(# First condition
(df[col] == 1) & (df_out[col] == 1),
"d", # If first condition is true
np.where( # If first condition is false apply second condition
df[col] != df_out[col],
"i",
df_out[col])
)
Output like this:
| 1 | 2 | 3 | 4 |
|----:|:----|:----|:----|
| 1 | 0 | d | d |
| 2 | d | 0 | i |
| 3 | d | i | 0 |
| 4 | 0 | 0 | 0 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | EmreAydin |