'Parsing in memory CSV files from zip archives

I'm working on a new library which will allow the user to parse any file (xlsx, csv, json, tar, zip, txt) into generators.

Now I'm stuck at zip archive and when I try to parse a csv from it, I get io.UnsupportedOperation: seek immediately after elem.seek(0). The csv file is a simple one 4x4 rows and columns. If I parse the csv using the csv_parser I get what I want, but trying to parse it from a zip archive... boom. Error!

with open("/Users/ro/Downloads/archive_file/csv.zip", 'r') as my_file_file:
    asd = parse_zip(my_file_file)
    print asd

Where parse_zip is:

def parse_zip(element):
"""Function for manipulating zip files"""
try:
    my_zip = zipfile.ZipFile(element, 'r')
except zipfile.BadZipfile:
    raise err.NestedArchives(element)
else:
    my_file = my_zip.open('corect_csv.csv')
    # print my_file
    my_mime = csv_tsv_parser.parse_csv_tsv(my_file)
    print list(my_mime)

And parse_cvs_tsv is:

def _csv_tsv_parser(element):
"""Helper function for csv and tsv files that return an generator"""

   for row in element:
       if any(s for s in row):
          yield row

def parse_csv_tsv(elem):
"""Function for manipulating all the csv files"""

   dialect = csv.Sniffer().sniff(elem.readline())
   elem.seek(0)

   data_file = csv.reader(elem, dialect)

   read_data = _csv_tsv_parser(data_file)

   yield '', read_data

Where am I wrong? Is the way I'm opening the file OK or...?



Solution 1:[1]

Zipfile.open returns a file-like ZipExtFile object that inherits from io.BufferedIOBase. io.BufferedIOBase does not support seek (only text streams in the io module support seek), hence the exception.

However, ZipExtFile does provide a peek method, which will return a number of bytes without moving the file pointer. So changing

dialect = csv.Sniffer().sniff(elem.readline())
elem.seek(0)

to

num_bytes = 128 # number of bytes to read
dialect = csv.Sniffer().sniff(elem.peek(n=num_bytes))

solves the problem.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 snakecharmerb