'Filter rows in csv file based on another csv file and save the filtered data in a new file
Good day all
so I was trying to filter file2 based on file1, where file1 is a subset from file2. But file2 has a description column that I need to be able to analyse the data in file1. What I'm trying to do is to filter file2, and get only the titles that are in file1 with their description. I tried this, but I'm not quite sure if it is totally right, plus it is working but I don't get any file saved on my computer.
import re
import mmap
from pandas import DataFrame
output = []
with open('file2.csv', 'r') as f2:
mm = mmap.mmap(f2.fileno(), 0, access=mmap.ACCESS_READ)
for line in open('file1.csv', 'r'):
Title = bytes("")
nameMatch = re.search(Title, mm)
if nameMatch:
# output.append(str(""))
fulltypes = [ 'O*NET-SOC Code', 'Title' , 'Discription' ]
final = DataFrame(columns=fulltypes)
final.to_csv(output.append(str("")))
mm.close()
Any idea?
Solution 1:[1]
Assuming your csv files aren't too huge, you can do this by reading both into pandas
and using the join
method. Take the following example:
import pandas as pd
file1 = pd.DataFrame({'Title': ['file1.csv', 'file2.csv', 'file3.csv']})
file2 = pd.DataFrame({'Title': ['file1.csv', 'file2.csv', 'file4.csv'],
'Description': ['List of files', 'List of descriptions', 'Something unrelated']})
joined = pd.merge(file1, file2, left_on='Title', right_on='Title')
print joined
This prints:
Title Description
0 file1.csv List of files
1 file2.csv List of descriptions
i.e. just the files that exist in both.
As pandas
can natively read a csv into a dataframe, in your case you could do:
import pandas as pd
file1 = pd.DataFrame.from_csv('file1.csv')
file2 = pd.DataFrame.from_csv('file2.csv')
joined = pd.merge(file1, file2, left_on='Title', right_on='Title')
joined.to_csv('Output.csv', index=False)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | asongtoruin |