'How to keep top 500 rows a csv loop (python) and overwrite each file
I am trying to read more than 100 csv files in python to keep the TOP 500 rows (they each have more than 55,0000 rows). So far I know how to do that, but I need save each modified file in the loop with its own filename in csv format. because normally I can output the concatenated dataframe to one big csv file, but this time I need to basically truncate each csv file to only keep top 500 rows and save each.
this is the code I have had so far:
import pandas as pd
import glob
FolderName = str(input("What's the name of the folder are you comparing? "))
path = str(input('Enter full path of the folder: '))
#r'C:\Users\si\Documents\UST\AST' # use your path
all_files = glob.glob(path + "/*.csv")
#list1 = []
d = {}
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0, nrows=500)
#list1.append(df)
d[filename] = df.columns
#frame = pd.concat(list1, axis=0, ignore_index=True)
frame = pd.DataFrame.from_dict(d, orient='index')
output_path = r'C:\Users\si\Downloads\New\{}_header.xlsx'.format(FolderName)
frame.to_excel(output_path)
Solution 1:[1]
Dataframes can write as well as read CSVs. So, just create and call to_csv
with the same filename.
import pandas as pd
import glob
FolderName = str(input("What's the name of the folder are you comparing? "))
path = input('Enter full path of the folder: ')
all_files = glob.glob(path + "/*.csv")
for filename in all_files:
pd.read_csv(filename, index_col=None, header=0, nrows=500).to_csv(filename)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | tdelaney |