'Python web scraping error: 'NoneType' object is not callable after using split function
I'm a beginner writing my first scraping script trying to extract company name, phone number, and email from the following page.
So far my script successfully pulls out the name and phone number, but I am getting stuck on pulling out the email, which is nested within a script object. My latest two attempts involved using regex, and when that failed, a split function, which is returning the error mentioned in the title.
Script:
import re
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup
url1 = "http://pcoc.officialbuyersguide.net/Listing?MDSID=CPC-1210"
html = urlopen(url1)
soup = BeautifulSoup(html,'html.parser')
for company_name in soup.find_all(class_='ListingPageNameAddress NONE'):
print(company_name.find('h1').text)
for phone in soup.find_all(class_='ListingPageNameAddress NONE'):
print(phone.find(class_='Disappear').text)
for email in soup.findAll(class_='ListingPageNameAddress NONE'):
print(email.find('script').text)
print(email.split('LinkValue: "')[1].split('"')[0])
print(re.findall(r"([\w\._]+\@([\w_]+\\.)+[a-zA-Z]+)", soup))
Error:
TypeError Traceback (most recent call last)
<ipython-input-20-ace5e5106ea7> in <module>
1 for email in soup.findAll(class_='ListingPageNameAddress NONE'):
2 print(email.find('script').text)
----> 3 print(email.split('LinkValue: "')[1].split('"')[0])
4 print(re.findall(r"([\w\._]+\@([\w_]+\\.)+[a-zA-Z]+)", soup))
TypeError: 'NoneType' object is not callable
HTML within "script" that I'm trying to pull from:
EMLink('com','aol','mikemhnam','<div class="emailgraphic"><img style="position: relative; top: 3px;" src="https://www.naylornetwork.com/EMailProtector/text-gif.aspx?sx=com&nx=mikemhnam&dx=aol&size=9&color=034af3&underline=yes" border=0></div>','pcoc.officialbuyersguide.net Inquiry','onClick=\'$.get("TrackLinkClick", { LinkType: "Email", LinkValue: "[email protected]", MDSID: "CPC-1210", AdListingID: "" });\'')
Solution 1:[1]
As far as I'm aware, BeautifulSoup doesn't expose a split
function on elements.
BeautifulSoup elements allow you to specify any attribute tough, and if it isn't a property or function of the element, it will look for a tag with that name. For instance, element.div
would find the first descendant of element
that is a div
. So you can even do things like element.nonsense
, and since nonsense
is not a function or property of the element
object, it then searches the document tree for a descendant with the name nonsense
, and since one doesn't exist, it will simply return None
.
So when you call email.split(...)
, it doesn't find a function or property called split
on the email
object, so it searches the HTML tree for a tag named split
. Since it can't find an element named split
, it returns None
, and you try to call it as a function, which results in the error you are getting.
Is it possible you meant to get the text from email email.text.split()
?
Solution 2:[2]
Try this, This might solve your problem.
import re
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup
url1 = "http://pcoc.officialbuyersguide.net/Listing?MDSID=CPC-1210"
html = urlopen(url1)
soup = BeautifulSoup(html,'html.parser')
for company_name in soup.find_all(class_='ListingPageNameAddress NONE'):
print(company_name.find('h1').text)
for phone in soup.find_all(class_='ListingPageNameAddress NONE'):
print(phone.find(class_='Disappear').text)
for email in soup.findAll(class_='ListingPageNameAddress NONE'):
print(email.find('script').text)
a=email.find('script').text
# print(email.split('LinkValue: "')[1].split('"')[0])
print(str(re.findall(r"\S+@\S+", a)).split('"')[1])
Solution 3:[3]
Did you try str(email) before you split it? It worked for me!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Parul Garg |
Solution 3 | Helpful |