'Compare two excel files for the difference using pandas with multiple tabs
I found this nice script online which does a great job comparing the differences between 2 excel sheets but there's an issue - it doesn't work if the excel files have multiple sheets in a given .xlsx:
import pandas as pd
import numpy as np
df1 = pd.read_excel('file1.xlsx')
df2 = pd.read_excel('file2.xlsx')
df1.equals(df2)
comparison_values = df1.values == df2.values
print(comparison_values)
rows, cols = np.where(comparison_values == False)
for item in zip(rows,cols):
df1.iloc[item[0], item[1]] = '{} --> {}'.format(df1.iloc[item[0], item[1]], df2.iloc[item[0], item[1]])
df1.to_excel('./Excel_diff.xlsx', index = False, header = True)
It works really well for what I need it for except it does not work when I have multiple sheets in each .xlsx - it only compares the first sheet of the files. Any ideas how to alter the script above so that it compares all sheets in the file? Thanks
Solution 1:[1]
As @StevenS said the comment section, you can use the sheet_name=None
option to get a dictionary containing all of the sheets and dataframes from the input files. Then you need to decide how you want to distinguish each sheet in your output file. In the example below I put one sheet in the output diff file for each sheet found in the file1.xlsx input.
import pandas as pd
import numpy as np
df1 = pd.read_excel('file1.xlsx', sheet_name=None)
df2 = pd.read_excel('file2.xlsx', sheet_name=None)
with ExcelWriter('./Excel_diff.xlsx') as writer:
for sheet,df1 in xl_1.items():
# check if sheet is in the other Excel file
if sheet in xl_2:
df2 = xl_2[sheet]
comparison_values = df1.values == df2.values
print(comparison_values)
rows, cols = np.where(comparison_values == False)
for item in zip(rows,cols):
df1.iloc[item[0], item[1]] = '{} --> {}'.format(df1.iloc[item[0], item[1]], df2.iloc[item[0], item[1]])
df1.to_excel(writer, sheet_name=sheet, index=False, header=True)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |