'How can I download a file in Python3 with urlopen() or add custom headers to urlretrieve()?

tl;dr I want do download a file from a server who only allows certain User-Agents. I managed to get a 200 OK from the site by using following code:

opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'Interwebs Exploiter 4')]
opener.open(url)

Since the file can be a .pdf or .zip or another format, I want to download it without parsing or reading it. Urlretrieve() seems like a good idea but it uses the default header, which makes the server return a 403 Forbidden.

How can I either download the file by using that custom built opener or simply add headers to urlretrieve()?

And this example in the Python Docs is complete gibberish to me.



Solution 1:[1]

I would use requests for that:

import requests   

headers = {'User-Agent': 'Interwebs Exploiter 4'}

 r = requests.get(url, allow_redirects=True, headers=headers)
    with open(filename, 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

Unless it's absolutely essential for some reason to use urllib

Solution 2:[2]

Download an URL with urllib.request:

opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'Interwebs Exploiter 4')]
with opener.open(url) as url_file:
    url_content = url_file.read()

Do note that url_file.read() will read the entire file into memory, which might not be what you want if it could be a very large file.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 telex-wap
Solution 2 David Foster