'Large Zip Files with Zipfile Module Python
I have never used the zip file module before. I have a directory that contains thousands of zip files i need to process. These files can be up to 6GB big. I have looked through some documentation but a lot of them are not clear on what the best methods are for reading large zip files without needing to extract.
I stumbled up this: Read a large zipped text file line by line in python
So in my solution I tried to emulate it and use it like I would reading a normal text file with the with open function
with open(odfslogp_obj, 'rb', buffering=102400) as odfslog
So I wrote the following based off the answer from that link:
for odfslogp_obj in odfslogs_plist:
with zipfile.ZipFile(odfslogp_obj, mode='r') as z:
with z.open(buffering=102400) as f:
for line in f:
print(line)
But this gives me an "unexpected keyword" error for z.open()
Question is, is there documentation that explains what keywords, the z.open() function would take? I only found one for the .ZipFile() function.
I wanna make sure my code isn't using up too much memory while processing these files line by line.
odfslogp_obj is a Path object btw
When I take off the buffering and just have z.open(), I get an error saying: TypeError: open() missing 1 required positional argument: 'name'
Solution 1:[1]
Once you've opened the zipfile, you still need to open the individual files it contains. That the second z.open
you had problems with. Its not the builtin python open
and it doesn't have a "buffering" parameter. See ZipFile.open
Once the zipfile is opened you can enumate its files and open them in turn. ZipFile.open opens in binary mode, which may be a different problem, depending on what you want to do with the file.
for odfslogp_obj in odfslogs_plist:
with zipfile.ZipFile(odfslogp_obj, mode='r') as z:
for name in z.namelist():
with z.open(name) as f:
for line in f:
print(line)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |