'Web Scraping price AirBnB data with Python

I have been trying to web scrape an air bnb website to obtain the price without much luck. I have successfully been able to bring in the other areas of interest (home description, home location, reviews, etc). Below is what I've tried unsuccessfully. I think that the fact the "price" on the web page is a 'span class' as opposed to the others which are 'div class' is where my issue is, but I'm speculating.

The URL I'm using is: https://www.airbnb.com/rooms/52361296?category_tag=Tag%3A8173&adults=4&children=0&infants=0&check_in=2022-12-11&check_out=2022-12-18&federated_search_id=6174a078-a823-4fad-827a-7ca652b5e786&source_impression_id=p3_1645454076_foOVSAshSYvdbpbS

This can be placed as the input in the below code.

Any assistance would be greatly appreciated.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
from bs4 import BeautifulSoup
import requests
from IPython.display import IFrame

input_string = input("""Enter URLs for AirBnB sites that you want webscraped AND separate by a ',' : """)
airbnb_list = []
try:
    airbnb_list = input_string.split(",")
    x = 0
    y = len(airbnb_list)
    while y >= x:
        print(x+1 , '.) ' , airbnb_list[x])
        x=x+1
        if y == x:
            break
    #print(airbnb_list[len(airbnb_list)])
except:
    print("""Please separate list by a ','""")

a = pd.DataFrame([{"Title":'', "Stars": '', "Size":'', "Check In":'', "Check Out":'', "Rules":'',
               "Location":'', "Home Type":'', "House Desc":''}])

for x in range(len(airbnb_list)):
        url = airbnb_list[x]
        soup = BeautifulSoup(requests.get(url).content, 'html.parser')
        stars = soup.find(class_='_c7v1se').get_text()
        desc = soup.find(class_='_12nksyy').get_text()
        size = soup.find(class_='_jro6t0').get_text()
        #checkIn = soup.find(class_='_1acx77b').get_text()
        checkIn = soup.find(class_='_12aeg4v').get_text()
        #checkOut = soup.find(class_='_14tl4ml5').get_text()
        checkOut = soup.find(class_='_12aeg4v').get_text()
        Rules = soup.find(class_='cihcm8w dir dir-ltr').get_text()
        #location = soup.find(class_='_9ns6hl').get_text()
        location = soup.find(class_='_152qbzi').get_text()
        HomeType = soup.find(class_='_b8stb0').get_text()
        title = soup.title.string

        print('Stars: ', stars)
        print('')
        #Home Type
        print('Home Type: ', HomeType)
        print('')
        #Space Description
        print('Description: ', desc)
        print('')
        print('Rental size: ',size)
        print('')
        #CheckIn
        print('Check In: ', checkIn)
        print('')
        #CheckOut
        print('Check Out: ', checkOut)
        print('')
        #House Rules
        print('House Rules: ',Rules)
        print('')
        #print(soup.find("button", {"id":"#Id name of the button"}))
        #Home Location
        print('Home location: ', location)
        #Dates available
        #print('Dates available: ', soup.find(class_='_1yhfti2').get_text())
        print('===================================================================================')

        df = pd.DataFrame([{"Title":title, "Stars": stars, "Size":size, "Check In":checkIn, "Check Out":checkOut, "Rules":Rules,
                       "Location":location, "Home Type":HomeType, "House Desc":desc}])
        a = a.append(df)

        #Attemping to print the price tag on the website
        print(soup.find_all('span', {'class': '_tyxjp1'}))
        print(soup.find(class_='_tyxjp1').get_text())


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-2d9689dbc836> in <module>
      1 #print(soup.find_all('span', {'class': '_tyxjp1'}))
----> 2 print(soup.find(class_='_tyxjp1').get_text())

AttributeError: 'NoneType' object has no attribute 'get_text'

Solution 1:^[1]

I see you are using the requests module to scrape airbnb. That module is extremely versatile and works on websites that have static content. However, it has one major drawback: it doesn't render content created by javascript. This is a problem, as most of the websites these days create additional html elements using javascript once the user lands on the web page.

The airbnb price block is created exactly like that - using javascript.

There are many ways to scrape that kind of content. My favourite way is to use selenium. It's basically a library that allows you to launch a real browser and communicate with it using your programming language of choice.

Here's how you can easily use selenium.

First, set it up. Notice the headless option which can be toggled on and off. Toggle it off if you want to see how the browser loads the webpage

# setup selenium (I am using chrome here, so chrome has to be installed on your system)
chromedriver_autoinstaller.install()
options = Options()
# if you set this to False if you want to see how the chrome window loads airbnb - useful for debugging
options.headless = True
driver = webdriver.Chrome(options=options)

Then, navigate to the website

# navigate to airbnb
driver.get(url)

Next, wait until the price block loads. It might appear near instantaneous to us, but depending on the speed of your internet connection it might take a few seconds

# wait until the price block loads
timeout = 10
expectation = EC.presence_of_element_located((By.CSS_SELECTOR, '._tyxjp1'))
price_element = WebDriverWait(driver, timeout).until(expectation)

And finally, print the price

# print the price
print(price_element.get_attribute('innerHTML'))

I added my code to your example so you could play around with it

import chromedriver_autoinstaller
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

import pandas as pd
from bs4 import BeautifulSoup
import requests
from selenium.webdriver.common.by import By

input_string = input("""Enter URLs for AirBnB sites that you want webscraped AND separate by a ',' : """)
airbnb_list = []
try:
    airbnb_list = input_string.split(",")
    x = 0
    y = len(airbnb_list)
    while y >= x:
        print(x+1 , '.) ' , airbnb_list[x])
        x=x+1
        if y == x:
            break
    #print(airbnb_list[len(airbnb_list)])
except:
    print("""Please separate list by a ','""")

a = pd.DataFrame([{"Title":'', "Stars": '', "Size":'', "Check In":'', "Check Out":'', "Rules":'',
               "Location":'', "Home Type":'', "House Desc":''}])

# setup selenium (I am using chrome here, so chrome has to be installed on your system)
chromedriver_autoinstaller.install()
options = Options()
# if you set this to False if you want to see how the chrome window loads airbnb - useful for debugging
options.headless = True
driver = webdriver.Chrome(options=options)

for x in range(len(airbnb_list)):
        url = airbnb_list[x]
        soup = BeautifulSoup(requests.get(url).content, 'html.parser')

        # navigate to airbnb
        driver.get(url)

        # wait until the price block loads
        timeout = 10
        expectation = EC.presence_of_element_located((By.CSS_SELECTOR, '._tyxjp1'))
        price_element = WebDriverWait(driver, timeout).until(expectation)

        # print the price
        print(price_element.get_attribute('innerHTML'))

Keep in mind that your IP might eventually get banned for scraping AirBnb. To work around that it is always a good idea to use proxy IPs and rotate them. Follow this rotating proxies tutorial to avoid getting blocked.

Hope that helps!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Zyy

'Web Scraping price AirBnB data with Python

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]