'Python web scraping error: 'NoneType' object is not callable after using split function

I'm a beginner writing my first scraping script trying to extract company name, phone number, and email from the following page.

So far my script successfully pulls out the name and phone number, but I am getting stuck on pulling out the email, which is nested within a script object. My latest two attempts involved using regex, and when that failed, a split function, which is returning the error mentioned in the title.

Script:

import re
import requests

from urllib.request import urlopen
from bs4 import BeautifulSoup

url1 = "http://pcoc.officialbuyersguide.net/Listing?MDSID=CPC-1210"
html = urlopen(url1)
soup = BeautifulSoup(html,'html.parser')

for company_name in soup.find_all(class_='ListingPageNameAddress NONE'):
    print(company_name.find('h1').text)

for phone in soup.find_all(class_='ListingPageNameAddress NONE'):
    print(phone.find(class_='Disappear').text)

for email in soup.findAll(class_='ListingPageNameAddress NONE'):
    print(email.find('script').text)
    print(email.split('LinkValue: "')[1].split('"')[0])
    print(re.findall(r"([\w\._]+\@([\w_]+\\.)+[a-zA-Z]+)", soup))

Error:

TypeError                                 Traceback (most recent call last)
<ipython-input-20-ace5e5106ea7> in <module>
      1 for email in soup.findAll(class_='ListingPageNameAddress NONE'):
      2     print(email.find('script').text)
----> 3     print(email.split('LinkValue: "')[1].split('"')[0])
      4     print(re.findall(r"([\w\._]+\@([\w_]+\\.)+[a-zA-Z]+)", soup))

TypeError: 'NoneType' object is not callable

HTML within "script" that I'm trying to pull from:

EMLink('com','aol','mikemhnam','<div class="emailgraphic"><img style="position: relative; top: 3px;" src="https://www.naylornetwork.com/EMailProtector/text-gif.aspx?sx=com&nx=mikemhnam&dx=aol&size=9&color=034af3&underline=yes" border=0></div>','pcoc.officialbuyersguide.net Inquiry','onClick=\'$.get("TrackLinkClick", { LinkType: "Email", LinkValue: "[email protected]", MDSID: "CPC-1210", AdListingID: "" });\'')



Solution 1:[1]

As far as I'm aware, BeautifulSoup doesn't expose a split function on elements.

BeautifulSoup elements allow you to specify any attribute tough, and if it isn't a property or function of the element, it will look for a tag with that name. For instance, element.div would find the first descendant of element that is a div. So you can even do things like element.nonsense, and since nonsense is not a function or property of the element object, it then searches the document tree for a descendant with the name nonsense, and since one doesn't exist, it will simply return None.

So when you call email.split(...), it doesn't find a function or property called split on the email object, so it searches the HTML tree for a tag named split. Since it can't find an element named split, it returns None, and you try to call it as a function, which results in the error you are getting.

Is it possible you meant to get the text from email email.text.split()?

Solution 2:[2]

Try this, This might solve your problem.

import re
import requests

from urllib.request import urlopen
from bs4 import BeautifulSoup

url1 = "http://pcoc.officialbuyersguide.net/Listing?MDSID=CPC-1210"
html = urlopen(url1)
soup = BeautifulSoup(html,'html.parser')

for company_name in soup.find_all(class_='ListingPageNameAddress NONE'):
    print(company_name.find('h1').text)

for phone in soup.find_all(class_='ListingPageNameAddress NONE'):
    print(phone.find(class_='Disappear').text)

for email in soup.findAll(class_='ListingPageNameAddress NONE'):
    print(email.find('script').text)
    a=email.find('script').text
#    print(email.split('LinkValue: "')[1].split('"')[0])
    print(str(re.findall(r"\S+@\S+", a)).split('"')[1])

Solution 3:[3]

Did you try str(email) before you split it? It worked for me!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Parul Garg
Solution 3 Helpful