'Why can't I scrape table data in order?
I'm trying to scrape table data off of this website:
https://www.nfl.com/standings/league/2019/REG
I have working code (below), however, it seems like the table data is not in the order that I see on the website.
On the website I see (top-down):
Baltimore Ravens, Green Bay Packers, ..., Cincinatti Bengals
But in my code results, I see (top-down): Bengals, Lions, ..., Ravens
Why is soup returning the tags out of order? Does anyone know why this is happening? Thanks!
import requests
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd
import lxml
url = 'https://www.nfl.com/standings/league/2019/REG'
soup = BeautifulSoup(requests.get(url).text, 'lxml')
print(soup) #not sure why soup isn't returning tags in the order I see on website
table = soup.table
headers = []
for th in table.select('th'):
headers.append(th.text)
print(headers)
df = pd.DataFrame(columns=headers)
for sup in table.select('sup'):
sup.decompose() #Removes sup tag from the table tree so x, xz* in nfl_team_name will not show up
for tr in table.select('tr')[1:]:
td_list = tr.select('td')
td_str_list = [td_list[0].select('.d3-o-club-shortname')[0].text]
td_str_list = td_str_list + [td.text for td in td_list[1:]]
df.loc[len(df)] = td_str_list
print(df.to_string())
Solution 1:[1]
After initial load the table is dynamically sorted by column PCT
- To get your goal do the same with your DataFrame
using sort_values()
:
pd.read_html('https://www.nfl.com/standings/league/2019/REG')[0].sort_values(by='PCT',ascending=False)
Or based on your example:
df.sort_values(by='PCT',ascending=False)
Output:
NFL Team | W | L | T | PCT | PF | PA | Net Pts | Home | Road | Div | Pct | Conf | Pct | Non-Conf | Strk | Last 5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ravens | 14 | 2 | 0 | 0.875 | 531 | 282 | 249 | 7 - 1 - 0 | 7 - 1 - 0 | 5 - 1 - 0 | 0.833 | 10 - 2 - 0 | 0.833 | 4 - 0 - 0 | 12W | 5 - 0 - 0 |
49ers | 13 | 3 | 0 | 0.813 | 479 | 310 | 169 | 6 - 2 - 0 | 7 - 1 - 0 | 5 - 1 - 0 | 0.833 | 10 - 2 - 0 | 0.833 | 3 - 1 - 0 | 2W | 3 - 2 - 0 |
Saints | 13 | 3 | 0 | 0.813 | 458 | 341 | 117 | 6 - 2 - 0 | 7 - 1 - 0 | 5 - 1 - 0 | 0.833 | 9 - 3 - 0 | 0.75 | 4 - 0 - 0 | 3W | 4 - 1 - 0 |
Packers | 13 | 3 | 0 | 0.813 | 376 | 313 | 63 | 7 - 1 - 0 | 6 - 2 - 0 | 6 - 0 - 0 | 1 | 10 - 2 - 0 | 0.833 | 3 - 1 - 0 | 5W | 5 - 0 - 0 |
...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |