'Comparing 2 columns with different rows in different csv files, and output status to another csv file
I have 2 csv files as shown below. They contain different numbers of rows and the columns are not aligned/sorted along a common index. I need to compare the column fname
on both files, and if they are same, have to check if place
is also the same or not, and output status true
or false
to another file3.csv
with columns fname, place, status
.
file1.csv
:
fname. age. place
Andy 16 a
kate. 15. k
brian. 17. b
file2.csv
:
fname. age. place
brian. 17. c
kate. 15. k
andy. 16. a
laura. 18. l
output file3.csv
should be like:
Andy. a. true
Kate. k. true
brian. b. false
I have seen quite number of queries similar to this, but they either have the same number of rows, or they are sorted and aligned along a common index.
I have tried using pandas, but that did not work, as the rows are different.
Solution 1:[1]
Here is an approach that will work using pandas:
import pandas as pd
Read the files into panda dataframes
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv)
Create a common key for each dataframe
df1['lk'] = df1.apply(lambda row: row['fname'].lower(), axis= 1)
df2['rk'] = df2.apply(lambda row: row['fname'].lower(), axis= 1)
Merge the frames into a third df, and add flag column
df3 = df1.merge(df2, left_on='lk', right_on='rk')
df3['flag'] = df3.apply(lambda row: row['place_x'] == row['place_y'], axis = 1)
Rename the fname_x and place_X columns
df3.rename(columns={'fname_x':'fname', 'place_x':'place'}, inplace=True)
Create the output csv file
df3[['fname', 'place', 'flag']].to_csv('file3', index=False)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | itprorh66 |