'How to scrape a data from a dynamic website containing Javascript using Python?

I am trying to scrape data from https://www.doordash.com/food-delivery/chicago-il-restaurants/

The idea is to scrape all the data regarding the different restaurant listings on the website. The site is divided into different cities, but I only require restaurant data for Chicago.

All restaurant listings for the city have to be scraped along with any other relevant data about the respective restaurants (Ex: Reviews, Rating, Cuisine, address, state etc). I need to capture all the respective details(currently 4,326 listings) for the city in the Excel.

I have tried to extract the restaurant name, cuisine, ratings and review inside the class named "StoreCard_root___1p3uN". But No datas have been displayed. The output is blank.


from selenium import webdriver

chrome_path = r"D:\python project\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)

driver.get("https://www.doordash.com/food-delivery/chicago-il-restaurants/")

driver.find_element_by_xpath("""//*[@id="SeoApp"]/div/div[1]/div/div[2]/div/div[2]/div/div[2]/div[1]/div[3]""").click()

posts = driver.find_elements_by_class_name("StoreCard_root___1p3uN")

for post in posts:
    print(post.text) ```




Solution 1:[1]

you can use the API url as the data rendered from it actually via XHR request.

iterate over the API link below and scrape whatever you want.

https://api.doordash.com/v2/seo_city_stores/?delivery_city_slug=chicago-il-restaurants&store_only=true&limit=50&offset=0

You will just loop over this parameter offset=0 by increasing it +50 each time as each page will shown 50 items till you reach 4300 as it's the last page ! simply by range(0, 4350, 50)

import requests
import pandas as pd

data = []
for item in range(0, 4350, 50):
    print(f"Extracting item# {item}")
    r = requests.get(
        f"https://api.doordash.com/v2/seo_city_stores/?delivery_city_slug=chicago-il-restaurants&store_only=true&limit=50&offset={item}").json()
    for item in r['store_data']:
        item = (item['name'], item['city'], item['category'],
                item['num_ratings'], item['average_rating'], item['average_cost'])
        data.append(item)

df = pd.DataFrame(
    data, columns=['Name', 'City', 'Category', 'Num Ratings', 'Average Ratings', 'Average Cost'])
df.to_csv('output.csv', index=False)
print("done")

Sample of Output:

enter image description here

View Output online: Click Here

Full Data is here: Click Here

Solution 2:[2]

I was faced with this issue too, but I solved it using selenium and BeautifulSoup by doing the following:

  1. Make sure the algorithm clicks button to reveal Menu and prices if necessary
  2. The menu and prices have to be processed because they will might come off as nested list after the extraction from parsing so the get_text() function won't work on them right away. The code and explanation can be found in this medium article

Tackling empty list web scraping with selenium

Solution 3:[3]

I have checked out the API that ????? ?????c?? mentions. They also had an endpoint for restaurant info.

URL https://api.doordash.com/v2/restaurant/[restaurantId]/

It was working until recently when it started returning {"detail":"Request was throttled."}

Has anyone had the same issue / found a workaround?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Stuart Murless