'Extract business hours from Google using only beautiful soup
Goal
Extract the business hours and its closed status from the Google Search results.
Screenshot with the highlighted working hours and closed status (example URL):
Screenshot with the highlighted working in the popup (example URL):
Problem
soup.find()
with the specific selector returns None
.
Description
I am trying to create a voice-activated AI similar to Google Home or Alexa that I can pair up with something cool. Currently, I'm trying to use data from the Google knowledge panel for specific search queries.
Code
def service(self, business):
url = requests.get("https://www.google.com/search?q={}+hours".format(business))
outputs = []
if url.status_code == 200:
soup = bs4.BeautifulSoup(url.text, "lxml")
# span class below is the class that contains the text that contains the hours shown for that day or just displays closed
string = soup.find("span", attrs={"class": "TLou0b JjSWRd"})
print(string)
# returns None
if url.status_code == 404:
print("Error")
return "Error 404"
How to extract the working hours and the closed status of the business?
PS. I'm on a Raspberry Pi 4. I don't want to use Selenium and its drivers. But I'm open to suggestions.
Solution 1:[1]
Selector for the business hours: [data-attrid='kc:/location/location:hours'] table tr
.
.TLou0b.JjSWRd
is a selector for the Google Answer Box.
From what I understand, you're looking for the business hours from the Google Knowledge Panel.
Code to extract business hours:
hours_wrapper_node = soup.select_one("[data-attrid='kc:/location/location:hours']")
if hours_wrapper_node is None:
logger.info("Business hours node is not found")
return
business_hours = {"open_closed_state": "", "hours": []}
business_hours["open_closed_state"] = hours_wrapper_node.select_one(
".JjSWRd span span span"
).text.strip()
location_hours_rows_nodes = hours_wrapper_node.select("table tr")
for location_hours_rows_node in location_hours_rows_nodes:
[day_of_week, hours] = [
td.text.strip() for td in location_hours_rows_node.select("td")
]
business_hours["hours"].append(
{"day_of_week": day_of_week, "business_hours": hours}
)
Output:
{
"hours": [
{"business_hours": "5:30–10PM", "day_of_week": "Wednesday"},
{"business_hours": "5:30–10PM", "day_of_week": "Thursday"},
{"business_hours": "5:30–11PM", "day_of_week": "Friday"},
{"business_hours": "5:30–11PM", "day_of_week": "Saturday"},
{"business_hours": "5:30–10PM", "day_of_week": "Sunday"},
{"business_hours": "Closed", "day_of_week": "Monday"},
{"business_hours": "5:30–10PM", "day_of_week": "Tuesday"},
],
"open_closed_state": "Closed",
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Illia Zub |