'Exception Handling Python TypeError: 'NoneType' object is not callable
I am scraping a list of urls with the same html format. Here is my scraper
import requests, bs4, csv, json
from pprint import pprint
with open('playerslist.json') as data_file:
data = json.load(data_file)
for i in data['player']:
name = i['player_name']
url = 'https://www.capfriendly.com/players/'+name
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
table = soup.find(id="cont_x")
with open(name+".csv", "w", newline='') as team_data:
def parse_td(td):
filtered_data = [tag.text for tag in td.find_all('span', recursive=False)
if 'q' not in tag.attrs['class']]
return filtered_data[0] if filtered_data else td.text;
for tr in table('tr', class_=['column_head', 'odd', 'even']):
row = [parse_td(td) for td in tr('td')]
writer = csv.writer(team_data)
writer.writerow(row)
The problem is that some of the url's (https://www.capfriendly.com/players/'+name
) pages no longer exist. Which means that when I try to scrape them, I get the following error
for tr in table('tr', class_=['column_head', 'odd', 'even']):
TypeError: 'NoneType' object is not callable
I could make sure that my list of urls are all valid but I have 10000 urls to go through. That is why I am looking for a way to handle the exception, that way if a page no longer exists, it is skipped and the following url gets scraped.
Also, if there is a more efficient way to store the data, please let me know.
Solution 1:[1]
Exception handling in python is done with try...except
statement.
try:
for tr in table('tr', class_=['column_head', 'odd', 'even']):
row = [parse_td(td) for td in tr('td')]
writer = csv.writer(team_data)
writer.writerow(row)
except TypeError:
pass
This will ignore the TypeError
exception raised when table
is None because the url doesn't exist anymore.
You can also check table
before the loop, but the try... except
approach is considered more 'pythonic', since exception handling isn't that much slower than checks in most cases (and for other subjective reasons of course).
Solution 2:[2]
I used the following successfully (Python v. 3.9):
try:
<my_code>
except AttributeError:
<my_code>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Unatiel |
Solution 2 | Ulrik Larsen |