'Scrape information off a complicated table

I need to scrape data off the seasons stats table of this website: https://fantasy.espn.com/basketball/league/standings?leagueId=1878319

I need to scrape data off the season stats table. I need a table with all the 10 rows and 18 columns. I am unable to identify the containers for the various columns. Please help me write code that will print out a data frame with all the rankings an all adjacent columns.

  import pandas as pd
     from bs4 import BeautifulSoup, Tag
     import requests
     import re
     data=[]
     res=requests.get("https://fantasy.espn.com/basketball/league/standings?leagueId=1878319")
     soup=BeautifulSoup(res.text,'lxml')
     listings=soup.findAll(class_='class="jsx-1423235964 season--stats--table')
     for listing in listings:
        listing_rank=listing.find('div',{'class':'jsx-2810852873 table--cell rank tar'})
        listing_name=listing.find('td',{'class':'Table2__td'}).attrs['title']
        full_dict={'rank':listing_rank, 'name':listing_name}
        data.append(full_dict)
    
        
     df=pd.DataFrame(data)
     print(df)
   

Empty DataFrame Columns: [] Index: []




Solution 1:[1]

All the data is there in the API. It's just a matter of parsing the return json to get what you want. I did my best to get what's in the table you were looking at. Also, I couldn't find the endpoint that stored the IDs for the stats column names, so just had to hard code that, but I'm sure that data is SOMEWHERE and then could just pull that and use that instead:

Code:

import requests
import pandas as pd

# This is the request url to API endpoint
url = 'https://fantasy.espn.com/apis/v3/games/fba/seasons/2020/segments/0/leagues/1878319'

# The parameter needed to get the data you want
payload = {
'view': 'mTeam',
       
}


# Return the data (which is in json format) and load into python
jsonData = requests.get(url, params=payload).json()

stats_cols = {
'0': 'PTS',
'1': 'BLK',
'2': 'STL',
'3': 'AST',
'4': 'OREB',
'6': 'REB',
'11': 'TO',
'13': 'FGM',
'14': 'FGA',
'15': 'FTM',
'16': 'FTA',
'17': '3PM',
'24': 'FTMI'}

# This will iterate through each of the items in the specific key:value
stats_df = pd.DataFrame()
stats = jsonData['teams']
for each in stats:
    
    # pull user data from the response
    user = pd.DataFrame([[each['abbrev'],each['location'], each['nickname'], each['rankFinal'], each['waiverRank']]],
                        columns=['abbrev','location','nickname','rank','waiver'])
    
    # Get the record data from json response
    record_df = pd.DataFrame([each['record']['overall']])
    
    # get the stats data from json response
    temp_stat_df = pd.DataFrame([each['valuesByStat']])
    

    # Merge/join those 3 tables together and rename the stats columns
    temp_df = pd.concat([user, record_df, temp_stat_df], sort=True, axis=1)
    temp_df = temp_df.rename(columns=stats_cols)
    

    # Append each row into a final dataframe
    stats_df = stats_df.append(temp_df, sort=True).reset_index(drop=True)
    
    
print (stats_df)

Output:

print (stats_df.to_string())
     3PM    AST   BLK     FGA    FGM    FTA    FTM   FTMI   OREB     PTS    REB    STL     TO abbrev  gamesBack         location  losses         nickname  percentage  pointsAgainst  pointsFor  rank  streakLength streakType  ties  waiver  wins
0  172.0  276.0  79.0  1206.0  568.0  330.0  272.0   58.0  113.0  1580.0  473.0   99.0  199.0    BBC        1.0       BasketBall       1           Chimps         0.5         2754.0     2624.5     0             1       LOSS     0       9     1
1   83.0  230.0  66.0   908.0  444.0  341.0  275.0   66.0  119.0  1246.0  567.0   74.0  161.0    Hou        1.0            Htown       1           ?? Dal         0.5         1998.0     2228.5     0             1        WIN     0       7     1
2  108.0  297.0  54.0   928.0  428.0  344.0  246.0   98.0   98.0  1210.0  520.0  102.0  154.0   SLNG        2.0          Yogurt        2         Slingers         0.0         2243.5     2249.5     0             2       LOSS     0       8     0
3  128.0  379.0  48.0  1226.0  570.0  432.0  323.0  109.0  102.0  1591.0  512.0   69.0  259.0   BAQU        1.0             TAMU       1  Shauced Shnacks         0.5         2113.0     2451.0     0             1       LOSS     0       6     1
4  177.0  290.0  90.0  1337.0  574.0  408.0  327.0   81.0  117.0  1652.0  578.0   82.0  215.0   Capt        1.0         Mr.Clean       1              ICE         0.5         2609.5     2698.5     0             1        WIN     0       5     1
5  124.0  245.0  49.0   953.0  475.0  267.0  208.0   59.0   99.0  1282.0  436.0   63.0  157.0   TRAP        1.0         Original       1        Gayngster         0.5         2309.0     2110.5     0             1       LOSS     0       1     1
6  105.0  436.0  76.0  1244.0  588.0  439.0  322.0  117.0  153.0  1603.0  741.0   98.0  244.0   PRAG        0.0      los angeles       0          lebrons         1.0         2720.0     2954.0     0             2        WIN     0       4     2
7  157.0  389.0  36.0  1318.0  588.0  394.0  309.0   85.0  112.0  1642.0  543.0   89.0  234.0    KMS        1.0            Kevin       1     Manning Show         0.5         2550.5     2630.0     0             1        WIN     0       3     1
8  128.0  313.0  37.0   963.0  417.0  240.0  177.0   63.0   85.0  1139.0  504.0  103.0  179.0   YANK        0.0          Yonkers       0         Yoinkers         1.0         2411.5     2156.5     0             2        WIN     0      10     2
9   88.0  243.0  64.0   913.0  441.0  272.0  222.0   50.0  119.0  1192.0  509.0   58.0  164.0   PRAD        2.0  Musty Burger FC       2       Juan Prado         0.0         2498.0     2104.0     0             2       LOSS     0       2     0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ann Zen