'How would I go about incorporating an if statement in item list?
I need to find the phone numbers in this website, I have come to the conclusion that I need to write an If statement but I'm not really sure how to do that since it's written into a itemlist "for item in questions:" am I on the right track here? How would I go about incorporating this into the questionslist?
Traceback (most recent call last):
File "c:\Users\Alexandar\Downloads\Scraper.py", line 37, in <module>
getQuestions('Industri', x)
File "c:\Users\Alexandar\Downloads\Scraper.py", line 28, in getQuestions
'Nummer': item.find('a', {'class': 'link-body'}).text,
AttributeError: 'NoneType' object has no attribute 'text'
I have no clue what could be causing this, I implemented a sanity check:
def get_href_item(src_item, tag, class_name):
href_item = src_item.find(tag, {"class": f"{class_name}"})
if href_item is not None:
href = href_item['href']
if href is not None:
return href
else:
return "HREF_NOT_FOUND"
just to try and at least get the linked version of the number, this also returns an error. I am at a wits end here and I really need this to work. Any help would be appreciated, please see the code below and see if you can figure anything out:
from gettext import find
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'}
questionlist = []
def get_href_item(src_item, tag, class_name):
href_item = src_item.find(tag, {"class": f"{class_name}"})
if href_item is not None:
href = href_item['href']
if href is not None:
return href
else:
return "HREF_NOT_FOUND"
def getQuestions(tag, page,):
url = f'https://www.merinfo.se/search?d=c&who={tag}&where=stockholm&emp=0%3A100&rev=50%3A100{page}'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
questions = soup.find_all('div', {'class': 'box-white p-0 mb-4'})
for item in questions:
question = {
'Branch': tag,
'Namn': item.find('a', {'class': 'link-primary'}).text,
'Nummer': item.find('a', {'class': 'link-body'}).text,
#'Nummer2': item.find('p', {'class': 'phonenumber mb-1 mb-md-2'}).text,
'RegÅr': item.find('div', {'class': 'col text-center'}).text,
'Address': item.find('address', {'class': 'mt-2 mb-0'}).text,
}
questionlist.append(question)
return
for x in range(1,9):
getQuestions('Industri', x)
getQuestions('Advokat', x)
getQuestions('Konsult', x)
df = pd.DataFrame(questionlist)
df.to_excel('Lista Henrik Nyström.xlsx')
print('Cha Ching')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|