Category "beautifulsoup"

Can't get all table elements using selenium webdriver

I'm trying to get all information from this website using Python/Selenium: https://bitinfocharts.com/top-100-richest-bitcoin-addresses.html I have successfully

How can I speed up the aiohttp parser bs4?

The task is to get data from the site. I have 800 URLs to request. But it takes a long time .I use aiohttp. At this stage, I have received links, by clicking on

Get a specific tag - BeautifulSoup

Below is the xml that I'm trying to parse. <url> <loc>https://www.houseofindya.com/aqua-chanderi-pleated-sharara-pants-177/iprdt</loc> &

How to scrape Trusted Shops?

I would appreciate your help on this scraping problem. I would like to scrape this site: https://www.trustedshops.de/bewertung/info_XDAD39B874C275A0751198C2510C

having trouble in getting all the P tags of page

the article passages are divided into different divs tags like in the image you can see it is written data-page-number="2" just like that the data is divided i

extract attributes from span

I have a lxml file and I need content from there. The file structure looks like this: <span class="ocr_line" id="line_1_1" title="bbox 394 185 1993 247">

Scraping .aspx page with Python yields 404

I'm a web-scraping beginner and am trying to scrape this webpage: https://profiles.doe.mass.edu/statereport/ap.aspx I'd like to be able to put in some settings

how can I pass the multiple .html file names in to a single txt output file that outputs all the href links in html along with their file names?

import pandas as pd import glob import csv import re from bs4 import BeautifulSoup links_with_text = [] textfile = open("a_file.txt", "w") for filename in glob

how can I pass the multiple .html file names in to a single txt output file that outputs all the href links in html along with their file names?

import pandas as pd import glob import csv import re from bs4 import BeautifulSoup links_with_text = [] textfile = open("a_file.txt", "w") for filename in glob

How to change URL for beautifulsoup scraper every time the program runs (without doing it manually)?

I have the following code to scrape Reddit usernames: from bs4 import BeautifulSoup from requests import get from fake_useragent import UserAgent

neither find_all nor find works

I am trying to scrape the name of every favorites on the page of a user of our choice. but with this code I get the error "ResultSet object has no attribute 'fi

Webscraping sale prices from a grocery store- Am I on the right track or is there a simpler way?

I am new to all of this, and this is my first real coding project so forgive me if the answer is obvious :) I am trying to extract sale items from [my grocery s

scraping NYT mini crossword stats gives 403 forbidden URL error

I'm trying to scrape my NYT mini crossword stats to then update a google sheet. But I'm having trouble with the login portion of the code. Here's my code so far

Beautifulsoup scraping "lazy faded" images

I am looking for a way to parse the images on a web page. Many posts already exist on the subject, and I was inspired by many of them, in particular : How Can I

if class not found return none in beatifulSoup

I'm trying to get None if the class is not found in web scraping. For example, in some cases stage-codes.html#10_99 doesn't exist in HTML. for st in soup.find_a

Web scraping from html code of a database using python

I am new to python and am learning things slowly. I have earlier performed API calls from databases to extract infromation. However, I was dealing with a partic

Extracting information from website with BeautifulSoup and Python

I'm attempting to extract information from this website. I can't get the text in the three fields marked in the image (in green, blue, and red rectangles) no ma

How to web scrape the text under <i class>?

I'm trying to get the text "PDF file" under <i class="fa fa-file-pdf-o">. I'm using BeautifulSoup and tried the following, but it didn't work: from bs4 im

How to web scrape the text under <i class>?

I'm trying to get the text "PDF file" under <i class="fa fa-file-pdf-o">. I'm using BeautifulSoup and tried the following, but it didn't work: from bs4 im

getting NoSuchWindowException while scrapping twitter usernames using Selenium

I have been trying to scrape twitter usernames by going inside the followers page but the issue is if I leave my pc there after some time I get this exception a