Category "web-scraping"

Scraping <span> text</span> with BeautifulSoup and urllib

I want to scrape 2015 from below HTML: I use the below code but am only able to scrape "Annee" soup.find('span', {'class':'optionLabel'}).get_text() Can someo

Need the number of total pages on a website to iterate but selenium keeps timing out

i'm triying to fix a data crawler that until last couple of weeks was working perfectly. The script consist of two parts, one that retrieves the links of the ar

What is causing Error TypeError: text is not iterable? - Web scraper Puppeteer NodeJs

I am learning nodejs/puppeteer and having issues getting Puppeteer to fill UPC numbers from a CSV file onto the search bar of a book website. I managed to get a

Get Puppeteer Page/Frame Handle for new page after `ElementHandle.click()`

Using puppeteer, I have a specific page that I am web-scraping for data and screenshot-ing for proof that the data is correct. The web page itself includes a bu

Tweepy for Twitter API v2 - Extracting Additional Fields for Tweet Search

I started playing around with Twitter API v2 in Tweepy. I've had some experience with v1 but it looks like it's changed a bit. I'm trying to search tweets based

Web Scrape pagination in a single URL (cheerio and axios)

newbie here. I was on web scraping project. And I wanted some guide on web scraping pagination technique. I'm scraping this site https://www.imoney.my/unit-trus

Selenium sync with google account

i've created function using selenium undetected chromedriver in order to create a google chat with email specifeid. And every time i run my code i have to log i

Using R code to scrape data from a webpage into an Excel file

I have written a code in R which is supposed to retrieve certain information from a website and import it into an Excel file. I have used it for one website and

Not getting all the html data in the devtools on zillow website (and other)

I'm trying to scrape real estate data from zillow. When I look the html code on the devtool, most of the links of the house details are not displayed in the htm

Cannot scrape the correct aspect ration of the image - Python

I'm having a problem to extract an image from a "Manga" website using python. Below is the element example on the website: img id="comic" class="loading" onerro

HTTP error 403 in Python 3 web scraping the publications

This is the traceback of the error that is happening when I am trying to put the URL of the publication. It works for the regular websites such as Stack Overflo

How to scrape wikipedia text from <p> without id or class?

I am scraping a Wikipedia text but the <p> does not have any class or id: import requests as r from bs4 import BeautifulSoup as bs url=r.get("https://en.

How to use scrapy to scrape google play reviews of applications?

I wrote this spider to scrape reviews of apps from google play. I am partially successful in this. I am able to extract the name, date, and review only. My ques

How to do Scrapy historical output comparison using Spidermon

So Scrapinghub is releasing a new feature for Scrapy quality insurance. It says it has historical comparison features where it can detect if the current scrape

removing `\n` using bs4 get_text()

from bs4 import BeautifulSoup # current output as below """ 'DOMINGUEZ, JONATHAN D. VS. RAMOS,\n SILVIA M' """ # d

Trouble modifying the language option in selenium python bindings

I've created a script in python in combination with selenium to scrape different app names from google play store and they all are coming through when I execute

Can't grab coordinates from ArcGIS iframe in a webpage using requests

I've created a script to get coordinates (-119.412 49.023 in this case) from a map located in a webpage using requests module. When I try using my script below

how to use same cookies over multiple requests when using python requests

I am new to python requests and am using it to scrape a website and get to a certain webpage, first I login and then I do a few requests for other webpages: im

OSError: [Errno 22} Invalid argument: 'downloaded/misc/jquery.js?v=1.4.4'

tfp = open(filename, 'wb') OSError: [Errno 22} Invalid argument: 'downloaded/misc/jquery.js?v=1.4.4' Can anyone help me with this error? I figure it has somet

Scraping content from urls in dataframe using R

Sorry, I'm relatively new to R and don't know it very well yet. I have also seen that similar questions have been asked more often. However, the corresponding s