'Pandas' read_html not reading html tables
I am trying to see if I can use, and only use, Pandas' read_html function to scrape HTML tables from the following website: https://www.baseball-reference.com/teams/ATL/2021.shtml
I can fulfil my needs using selenium/bs but want to see if I can scrape this site's tables with just pd.read_html alone.
Currently, pd.read_html returns the first two tables, but is not able to access tables past the second table.
Here is an example of a table 'id' that I am trying to access: 'the40man'
And my code, which returns 'ValueError: No tables found':
pd.read_html("https://www.baseball-reference.com/teams/ATL/2021.shtml", attrs = {'id': 'the40man'})
The following code returns the first two tables, {'id': ['team_batting', 'team_pitching']}, but nothing more:
pd.read_html("https://www.baseball-reference.com/teams/ATL/2021.shtml")
I am asking this question out of curiosity in case I'm missing something on my end. If not, this issue is likely due to pd.read_html's limitations.
Thank you in advance for any input/pd.read_html tips!
Solution 1:[1]
The reference.com sites have some of those tables within the comments of the html. To pull those table out, you need to first pull out the comments. Then you can iterate through those to get the table you want:
import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd
url = 'https://www.baseball-reference.com/teams/ATL/2021.shtml'
result = requests.get(url).text
data = BeautifulSoup(result, 'html.parser')
comments = data.find_all(string=lambda text: isinstance(text, Comment))
tables = []
for each in comments:
if 'table' in str(each):
try:
tables.append(pd.read_html(str(each), attrs = {'id': 'the40man'})[0])
break
except:
continue
Output:
print(tables[0])
Rk Uni Name Unnamed: 3 ... Ht Wt DoB 1stYr
0 1 30 Kyle Wright us US ... 6' 4" 215 Oct 2, 1995 2015
1 2 0 William Woods us US ... 6' 3" 190 Dec 29, 1998 2018
2 3 51 Will Smith us US ... 6' 5" 255 Jul 10, 1989 2008
3 4 68 Tyler Matzek us US ... 6' 3" 230 Oct 19, 1990 2010
4 5 64 Tucker Davidson us US ... 6' 2" 215 Mar 25, 1996 2016
5 6 62 Touki Toussaint us US ... 6' 3" 215 Jun 20, 1996 2014
6 7 65 Spencer Strider us US ... 6' 0" 195 Oct 28, 1998 2018
7 8 15 Sean Newcomb us US ... 6' 5" 255 Jun 12, 1993 2012
8 9 40 Mike Soroka ca CA ... 6' 5" 225 Aug 4, 1997 2015
9 10 54 Max Fried us US ... 6' 4" 190 Jan 18, 1994 2012
10 11 77 Luke Jackson us US ... 6' 2" 210 Aug 24, 1991 2011
11 12 33 A.J. Minter us US ... 6' 0" 215 Sep 2, 1993 2013
12 13 0 Kirby Yates us US ... 5' 10" 205 Mar 25, 1987 2009
13 14 0 Jay Jackson us US ... 6' 1" 195 Oct 27, 1987 2008
14 15 71 Jacob Webb us US ... 6' 2" 210 Aug 15, 1993 2014
15 16 19 Huascar Ynoa do DO ... 6' 2" 220 May 28, 1998 2015
16 17 36 Ian Anderson us US ... 6' 3" 170 May 2, 1998 2016
17 18 0 Freddy Tarnok us US ... 6' 3" 185 Nov 24, 1998 2017
18 19 74 Dylan Lee us US ... 6' 3" 214 Aug 1, 1994 2015
19 20 0 Alan Rangel mx MX ... 6' 2" 170 Aug 21, 1997 2015
20 21 0 Brooks Wilson us US ... 6' 2" 205 Mar 15, 1996 2015
21 22 50 Charlie Morton us US ... 6' 5" 215 Nov 12, 1983 2002
22 23 14 Adam Duvall us US ... 6' 1" 215 Sep 4, 1988 2010
23 24 24 William Contreras ve VE ... 6' 0" 180 Dec 24, 1997 2015
24 25 27 Austin Riley us US ... 6' 3" 240 Apr 2, 1997 2015
25 26 16 Travis d'Arnaud us US ... 6' 2" 210 Feb 10, 1989 2007
26 27 0 Travis Demeritte us US ... 6' 0" 180 Sep 30, 1994 2013
27 28 0 Chadwick Tromp aw AW ... 5' 8" 221 Mar 21, 1995 2013
28 29 25 Cristian Pache do DO ... 6' 2" 215 Nov 19, 1998 2016
29 30 13 Ronald Acuna Jr. ve VE ... 6' 0" 205 Dec 18, 1997 2015
30 31 1 Ozzie Albies cw CW ... 5' 8" 165 Jan 7, 1997 2014
31 32 9 Orlando Arcia ve VE ... 6' 0" 187 Aug 4, 1994 2011
32 33 7 Dansby Swanson us US ... 6' 1" 190 Feb 11, 1994 2013
33 34 0 Drew Waters us US ... 6' 2" 185 Dec 30, 1998 2017
34 35 20 Marcell Ozuna do DO ... 6' 1" 225 Nov 12, 1990 2008
35 36 0 Manny Pina ve VE ... 6' 0" 222 Jun 5, 1987 2005
36 37 38 Guillermo Heredia cu CU ... 5' 10" 195 Jan 31, 1991 2009
37 38 66 Kyle Muller us US ... 6' 7" 250 Oct 7, 1997 2016
38 Rk Uni Name NaN ... Ht Wt DoB 1stYr
[39 rows x 14 columns]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | chitown88 |