'Scrape information off a complicated table
I need to scrape data off the seasons stats table of this website: https://fantasy.espn.com/basketball/league/standings?leagueId=1878319
I need to scrape data off the season stats table. I need a table with all the 10 rows and 18 columns. I am unable to identify the containers for the various columns. Please help me write code that will print out a data frame with all the rankings an all adjacent columns.
import pandas as pd
from bs4 import BeautifulSoup, Tag
import requests
import re
data=[]
res=requests.get("https://fantasy.espn.com/basketball/league/standings?leagueId=1878319")
soup=BeautifulSoup(res.text,'lxml')
listings=soup.findAll(class_='class="jsx-1423235964 season--stats--table')
for listing in listings:
listing_rank=listing.find('div',{'class':'jsx-2810852873 table--cell rank tar'})
listing_name=listing.find('td',{'class':'Table2__td'}).attrs['title']
full_dict={'rank':listing_rank, 'name':listing_name}
data.append(full_dict)
df=pd.DataFrame(data)
print(df)
Empty DataFrame Columns: [] Index: []
Solution 1:[1]
All the data is there in the API. It's just a matter of parsing the return json to get what you want. I did my best to get what's in the table you were looking at. Also, I couldn't find the endpoint that stored the IDs for the stats column names, so just had to hard code that, but I'm sure that data is SOMEWHERE and then could just pull that and use that instead:
Code:
import requests
import pandas as pd
# This is the request url to API endpoint
url = 'https://fantasy.espn.com/apis/v3/games/fba/seasons/2020/segments/0/leagues/1878319'
# The parameter needed to get the data you want
payload = {
'view': 'mTeam',
}
# Return the data (which is in json format) and load into python
jsonData = requests.get(url, params=payload).json()
stats_cols = {
'0': 'PTS',
'1': 'BLK',
'2': 'STL',
'3': 'AST',
'4': 'OREB',
'6': 'REB',
'11': 'TO',
'13': 'FGM',
'14': 'FGA',
'15': 'FTM',
'16': 'FTA',
'17': '3PM',
'24': 'FTMI'}
# This will iterate through each of the items in the specific key:value
stats_df = pd.DataFrame()
stats = jsonData['teams']
for each in stats:
# pull user data from the response
user = pd.DataFrame([[each['abbrev'],each['location'], each['nickname'], each['rankFinal'], each['waiverRank']]],
columns=['abbrev','location','nickname','rank','waiver'])
# Get the record data from json response
record_df = pd.DataFrame([each['record']['overall']])
# get the stats data from json response
temp_stat_df = pd.DataFrame([each['valuesByStat']])
# Merge/join those 3 tables together and rename the stats columns
temp_df = pd.concat([user, record_df, temp_stat_df], sort=True, axis=1)
temp_df = temp_df.rename(columns=stats_cols)
# Append each row into a final dataframe
stats_df = stats_df.append(temp_df, sort=True).reset_index(drop=True)
print (stats_df)
Output:
print (stats_df.to_string())
3PM AST BLK FGA FGM FTA FTM FTMI OREB PTS REB STL TO abbrev gamesBack location losses nickname percentage pointsAgainst pointsFor rank streakLength streakType ties waiver wins
0 172.0 276.0 79.0 1206.0 568.0 330.0 272.0 58.0 113.0 1580.0 473.0 99.0 199.0 BBC 1.0 BasketBall 1 Chimps 0.5 2754.0 2624.5 0 1 LOSS 0 9 1
1 83.0 230.0 66.0 908.0 444.0 341.0 275.0 66.0 119.0 1246.0 567.0 74.0 161.0 Hou 1.0 Htown 1 ?? Dal 0.5 1998.0 2228.5 0 1 WIN 0 7 1
2 108.0 297.0 54.0 928.0 428.0 344.0 246.0 98.0 98.0 1210.0 520.0 102.0 154.0 SLNG 2.0 Yogurt 2 Slingers 0.0 2243.5 2249.5 0 2 LOSS 0 8 0
3 128.0 379.0 48.0 1226.0 570.0 432.0 323.0 109.0 102.0 1591.0 512.0 69.0 259.0 BAQU 1.0 TAMU 1 Shauced Shnacks 0.5 2113.0 2451.0 0 1 LOSS 0 6 1
4 177.0 290.0 90.0 1337.0 574.0 408.0 327.0 81.0 117.0 1652.0 578.0 82.0 215.0 Capt 1.0 Mr.Clean 1 ICE 0.5 2609.5 2698.5 0 1 WIN 0 5 1
5 124.0 245.0 49.0 953.0 475.0 267.0 208.0 59.0 99.0 1282.0 436.0 63.0 157.0 TRAP 1.0 Original 1 Gayngster 0.5 2309.0 2110.5 0 1 LOSS 0 1 1
6 105.0 436.0 76.0 1244.0 588.0 439.0 322.0 117.0 153.0 1603.0 741.0 98.0 244.0 PRAG 0.0 los angeles 0 lebrons 1.0 2720.0 2954.0 0 2 WIN 0 4 2
7 157.0 389.0 36.0 1318.0 588.0 394.0 309.0 85.0 112.0 1642.0 543.0 89.0 234.0 KMS 1.0 Kevin 1 Manning Show 0.5 2550.5 2630.0 0 1 WIN 0 3 1
8 128.0 313.0 37.0 963.0 417.0 240.0 177.0 63.0 85.0 1139.0 504.0 103.0 179.0 YANK 0.0 Yonkers 0 Yoinkers 1.0 2411.5 2156.5 0 2 WIN 0 10 2
9 88.0 243.0 64.0 913.0 441.0 272.0 222.0 50.0 119.0 1192.0 509.0 58.0 164.0 PRAD 2.0 Musty Burger FC 2 Juan Prado 0.0 2498.0 2104.0 0 2 LOSS 0 2 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ann Zen |