'How can I download a file in Python3 with urlopen() or add custom headers to urlretrieve()?
tl;dr I want do download a file from a server who only allows certain User-Agents
. I managed to get a 200 OK
from the site by using following code:
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'Interwebs Exploiter 4')]
opener.open(url)
Since the file can be a .pdf or .zip or another format, I want to download it without parsing or reading it. Urlretrieve()
seems like a good idea but it uses the default header, which makes the server return a 403 Forbidden
.
How can I either download the file by using that custom built opener or simply add headers to urlretrieve()
?
And this example in the Python Docs is complete gibberish to me.
Solution 1:[1]
I would use requests
for that:
import requests
headers = {'User-Agent': 'Interwebs Exploiter 4'}
r = requests.get(url, allow_redirects=True, headers=headers)
with open(filename, 'wb') as f:
for chunk in r.iter_content(1024):
f.write(chunk)
Unless it's absolutely essential for some reason to use urllib
Solution 2:[2]
Download an URL with urllib.request
:
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'Interwebs Exploiter 4')]
with opener.open(url) as url_file:
url_content = url_file.read()
Do note that url_file.read()
will read the entire file into memory, which might not be what you want if it could be a very large file.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | telex-wap |
Solution 2 | David Foster |