'How to specify needed fields using Beautiful Soup and properly call upon website elements using HTML tags
I have been trying to create a web scraping program that will return the values of the Title, Company, and Location from job cards on Indeed. I finally am not returning error codes, however, I am only returning one value for each of the desired fields when there are multiple job cards I am attempting to call on. Also, the value for the company field is returning the value for the title field because is used there as well in the HTML code. I am unfamiliar with HTML code or how to specify my needs using Beautiful Soup. I have tried to use the documentation and played with a few different methods to solve this problem, but have been unsuccessful.
import requests
from bs4 import BeautifulSoup
page = requests.get("https://au.indeed.com/jobs?
q=web%20developer&l=perth&from=searchOnHP&vjk=6c6cd45320143cdf").text
soup = BeautifulSoup(page, "lxml")
results = soup.find(id="mosaic-zone-jobcards")
job_elements = results.find("div", class_="slider_container")
for job_element in job_elements:
title = job_element.find("h2")
company = job_element.find("span")
location = job_element.find("div", class_="companyLocation")
print(title.text)
print(company.text)
print(location.text)
Here is what returns to the console:
C:\Users\Admin\PycharmProjects\WebScraper1\venv\Scripts\python.exe
C:/Users/Admin/PycharmProjects/WebScraper1/Indeed.py
Web App Developer
Web App Developer
Perth WA 6000
Process finished with exit code 0
Solution 1:[1]
job_elements
only returns the first matching element because you used find
instead of find_all
. For the same reason company
links to the first span
found in div.slider_container
. The span
you want contains class="companyName"
. Also, the prints should be inside the for loop. Here's the improved code.
job_elements = results.find_all("div", class_="slider_container")
for job_element in job_elements:
title = job_element.find("h2")
company = job_element.find("span", class_="companyName")
location = job_element.find("div", class_="companyLocation")
print(title.text)
print(company.text)
print(location.text)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | wallfell00 |