'How to get the actual download link embedded with any "Download" button
I have this code to download a big file in chunks:
import requests
from tqdm import tqdm
def get_size(url):
response = requests.head(url)
size = int(response.headers['Content-Length'])
return size
def download_file(url):
local_filename = "./archive.zip"
total_size = get_size(url) # total size in bytes
chunk_size = 100000000 # size of 1 chunk to download 100000000 = 100 MB
with requests.get(url, stream=True, allow_redirects=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in tqdm(r.iter_content(chunk_size = chunk_size), total = total_size // chunk_size): # Download a 100 mb chunk
#if chunk:
f.write(chunk)
return local_filename
I am trying to download This dataset whic can be downloaded by click of a button. So I copied the link as using copy link address
and https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset/download
the link associated with download button.
when I passed in the thhe above URL to my code, it returned that no content-length
present means it is returning NULL. So I had to
Download Manually in Chrome -> Cancel Download -> Go to Downloads -> Copy the link
Which when used with wget
or my code, easily downloaded the file.
!wget -O images.zip "https://storage.googleapis.com/kaggle-data-sets/31296/39911/bundle/archive.zip?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20220511%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20220511T155300Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=aa91d1b289e42dda73637e517f85367096f2ef8b76aed372510307c3975a0b038ffe47743cf420745ff190a4a2002ff06515405571350f885d56757187d1a1be91aabe1d76a98e826a3a1396d0381ed427aa4d78c78a5131ccd9651470f5d73a9e4e915c500f4d999450b886e66a18acf650741abfb23e94d0458b628fd18f869393892004a2af9f1ddb612352c68b0287d6286acc0b89fd0e884a7adb09ab5203fed46f43e4fc9fcf865baeb8a84ca90c2d8e96bc17cd1c05a87202ba9f2e2c6127466207a334cce0a0fdff8c386b459e82dcca3b3f9452ef910a1ce42a3e0899c88a532eecf84614059808a2988ace27002f16f9134c7e69bce4848f9dc004"
I'm asking is that is there a way to know this link and download automatically in requests
, Python
or Linux
?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|