'How to get the html of this website using python requests?
I am trying to download html file from the following website:
If I look at the source in Google Chrome, I can get the HTML without any problem. But, I want to download multiple pages with Python requests. However, if I try to get the html that way, I encounter an error.
Using:
response = requests.get(url)
content = response.text
with open('filename', 'w') as dat:
dat.write(content)
I get the following error:
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
I also tried using "allow_redirects=False", however, if I do that, I get a faulty html, which only contains the following text:
Object Moved
This document may be found here.
I am wondering what to do to be able to download this html using requests in python.
If I add the header:
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36'
the code does run, but once again, not giving the html I'm looking for. The html it creates is just one like something like this
<html><head><title>avto.net</title><style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style></head><body style="margin:0"><p id="cmsg">Please enable JS and disable any ad blocker</p><script>var ...
Solution 1:[1]
Try define a header for your requests.get() function i.e.
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',}
url = <url-here>
page = requests.get(url,headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
This fixed it for me.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Elias |