'Pandas - split large excel file
I have an excel file with about 500,000 rows and I want to split it to several excel file, each with 50,000 rows.
I want to do it with pandas so it will be the quickest and easiest.
any ideas how to make it?
thank you for your help
Solution 1:[1]
Assuming that your Excel file has only one (first) sheet containing data, I'd make use of chunksize
parameter:
import pandas as pd
import numpy as np
i=0
for df in pd.read_excel(file_name, chunksize=50000):
df.to_excel('/path/to/file_{:02d}.xlsx'.format(i), index=False)
i += 1
UPDATE:
chunksize = 50000
df = pd.read_excel(file_name)
for chunk in np.split(df, len(df) // chunksize):
chunk.to_excel('/path/to/file_{:02d}.xlsx'.format(i), index=False)
Solution 2:[2]
use np.split_array as per this answer https://stackoverflow.com/a/17315875/1394890 if you get
array split does not result in an equal division
Solution 3:[3]
As explained by MaxU, I will also make use of a variable chunksize and divide the total number of rows in large file into required number of rows.
import pandas as pd
import numpy as np
chunksize = 50000
i=0
df = pd.read_excel("path/to/file.xlsx")
for chunk in np.split(df, len(df) // chunksize):
chunk.to_excel('path/to/destination/folder/file_{:02d}.xlsx'.format(i), index=True)
i += 1
Hope this would help you.
Solution 4:[4]
import pandas as pd
l = pd.read_excel("inputfilename.xlsx")
total_size = 500,000
chunk_size = 50000
for i in range(0, total_size, chunk_size):
df = l[i:i+chunk_size]
df.to_excel(str(i)+"outputfilename.xlsx")
Solution 5:[5]
I wrote a function for this:
import numpy as np
import pandas as pd
def split_excel(file_name, n): # n: number of chunks or parts (number of outputed excel files)
df = pd.read_excel(file_name)
l = len(df)
c = l // n # c: number of rows
r = l % c
if r != 0: # if it is not divisible
df[-r:].to_excel(f'part_{l//c+1}.xlsx', index=False)
df = df[:-r]
i = 0
for part in np.split(df, l//c):
part.to_excel(f'part_{i}.xlsx', index=False)
i += 1
split_excel('my_file.xlsx')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mohammadreza berneti |
Solution 2 | wild |
Solution 3 | Tarun Balani |
Solution 4 | Adeel Afzal |
Solution 5 | Behrooz Ostadaghaee |