'How do I request a zipfile, extract it, then create pandas dataframes from the csv files?

Load in these CSV files from the Sean Lahman's Baseball Database. For this assignment, we will use the 'Salaries.csv' and 'Teams.csv' tables. Read these tables into a pandas DataFrame and show the head of each table.

 #Here's the code I have so far:
 import requests
 import io
 import zipfile
 url = 'http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip
 r = requests.get(url,auth=('user','pass'))

 #These were lines of code I looked up but am not sure to use:
 #with zipfile.ZipFile('/path/to/file', 'r') as z:
      #f = z.open('member.csv')
        #table = pd.io.parsers.read_table(f, ...)
 #salariesData = pd.read_csv('Salaries.csv')
 #teamsData = pd.read_csv('Teams.csv')


Solution 1:[1]

Request returns a bytes file, so first convert bytes to zip file:

mlz = zipfile.ZipFile(io.BytesIO(r.content))

To see what's in the zipfile, type:

mlz.namelist()

Then you can extract and read the CSV corresponding to the index, x:

df1  = pd.read_csv(mlz.open(mlz.namelist()[0]))
df2 = pd.read_csv(mlz.open(mlz.namelist()[1]))

In your specific case, this will likely be:

salariesData = pd.read_csv(mlz.open('Salaries.csv'))
teamsData = pd.read_csv(mlz.open('Teams.csv'))

(All of this ^ assumes you're using Python 3.x)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1