Category "web-scraping"

How can I handle pagination with Scrapy and Splash, if the href of the button is javascript:void(0)

I am trying to scrape the names and links of universities from this website: https://www.topuniversities.com/university-rankings/world-university-rankings/2021,

Setting a default for nosuchelementexception for multiple variables in python

So I am scrapping multiple rows of a table and many of them are either available or not for different pages. What I want to do is to detect which field is not a

Scrape and change data in date in BeautifulSoup

I am scraping data from different web pages and there are several dates in this data. The code allowing me to have the information that I want looks like this,

Can I change a drop down item from a list in Jsoup and submit it?

I have a site I'm trying to scrape with Jsoup that has monthly and yearly selection boxes where the data changes when a different month or year is selected. Edi

Can't bypass cloudflare with python cloudscraper

I faced with cloudflare issue when I tried to parse the website. I got this code import cloudscraper url = "https://author.today" scraper = cloudscraper.create

Not able to replicate AJAX using Python Requests

I am trying to replicate ajax request from a web page (https://droughtmonitor.unl.edu/Data/DataTables.aspx). AJAX is initiated when we select values from dropdo

Not able to replicate AJAX using Python Requests

I am trying to replicate ajax request from a web page (https://droughtmonitor.unl.edu/Data/DataTables.aspx). AJAX is initiated when we select values from dropdo

Unable to iterate through list using BeautifulSoup

I am doing some experiments with Python3.6 in Mac and BeautifulSoup. I am trying to build a simple program to scrap song lyrics from a URL and store them as pla

Pulling company name from webpage within <a> tag

I am trying to streamline my data collection by using Python 3.7 and BeautifulSoup to pull company name, if that company is approved or other, and if they are m

I get InvalidURL: URL can't contain control characters when I try to send a request using urllib

I am trying to get a JSON response from the link used as a parameter to the urllib request. but it gives me an error that it can't contain control characters. h

NFT List Prices

OpenSea allows users to buy and sell NFTs. From OpenSea, you can view the prices of listed NFTs within a project. When an NFT is listed, is the listed price sto

How to scrape an image src using puppeteer in NodeJS?

I'm trying to scrape the source of the first image with a specific class. On the page, there are multiple images with different additional classes but they shar

C# Discord Embed Emoji Scraping

So I have set up a program that the goal of which is to run through every possible ID and test that ID to see if there is a discord emoji URL associated with it

Extract everything inside tag, but not tag itself

I'm using BeautifulSoup to scrape text from a website, but I only want the <p> tags for organization. However, I can't use text.findAll('p'), because the

How to get access to data in Github Repo with Nodejs Express

I'm currently trying to get COVID-19 from the Covid Data Repository by Johns Hopkins. https://github.com/CSSEGISandData/COVID-19 The repo get updated with new d

Selenium-Wire Your Connection Is Not Secure

I'm using selenium-wire with undetectable chromedriver and it's giving me: "Your Connection To This Site Is Not Secure" when I go into a site, and the https in

web scraping on infinite scroll using sites

I am trying to scrape a data from opensea. I can scroll page but I dont know how could collect data while scrolling. my code from selenium import webdriver fro

How would I go about incorporating an if statement in item list?

I need to find the phone numbers in this website, I have come to the conclusion that I need to write an If statement but I'm not really sure how to do that sinc

How to find element with selenium on python?

import os import selenium from selenium import webdriver import time browser = webdriver.Chrome() browser.get('https://www.skysports.com/champions-league-fixtu

Puppeteer - How to use page.click() inside page.evaluate()

I am scraping a table and each row has a button to show a modal with information. I need to scraping the information from the modal for each row but I dont know