'How to get the actual download link embedded with any "Download" button

I have this code to download a big file in chunks:

import requests 
from tqdm import tqdm

def get_size(url):
    response = requests.head(url)
    size = int(response.headers['Content-Length'])
    return size

def download_file(url):
    local_filename = "./archive.zip"
    total_size = get_size(url) # total size in bytes
    chunk_size = 100000000 # size of 1 chunk to download 100000000 = 100 MB
    
    with requests.get(url, stream=True, allow_redirects=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in tqdm(r.iter_content(chunk_size = chunk_size), total  = total_size // chunk_size): # Download a 100 mb chunk
                #if chunk: 
                f.write(chunk)
    return local_filename

I am trying to download This dataset whic can be downloaded by click of a button. So I copied the link as using copy link address and https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset/download the link associated with download button.

when I passed in the thhe above URL to my code, it returned that no content-length present means it is returning NULL. So I had to

Download Manually in Chrome -> Cancel Download -> Go to Downloads -> Copy the link

Which when used with wget or my code, easily downloaded the file.

!wget -O images.zip "https://storage.googleapis.com/kaggle-data-sets/31296/39911/bundle/archive.zip?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20220511%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20220511T155300Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=aa91d1b289e42dda73637e517f85367096f2ef8b76aed372510307c3975a0b038ffe47743cf420745ff190a4a2002ff06515405571350f885d56757187d1a1be91aabe1d76a98e826a3a1396d0381ed427aa4d78c78a5131ccd9651470f5d73a9e4e915c500f4d999450b886e66a18acf650741abfb23e94d0458b628fd18f869393892004a2af9f1ddb612352c68b0287d6286acc0b89fd0e884a7adb09ab5203fed46f43e4fc9fcf865baeb8a84ca90c2d8e96bc17cd1c05a87202ba9f2e2c6127466207a334cce0a0fdff8c386b459e82dcca3b3f9452ef910a1ce42a3e0899c88a532eecf84614059808a2988ace27002f16f9134c7e69bce4848f9dc004"

I'm asking is that is there a way to know this link and download automatically in requests, Python or Linux?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source