'azure blob python library - hanging on readinto() function
I have been struggling with an application hang, randomly, I have a very large number of files to download (~100k), after a random number the script will hang, and not exit with any useful errors.
This is my code to download a blob:
with open(file_name, "wb") as file:
logger.info("Downloading to " + file_name+" ...")
blob_client.download_blob().readinto(file)
I tried limiting the script to 1000 files at a time, but after a few runs it will eventually lock again. Without the application hanging or providing any error information, all our attempts to resolve the issue have been unsuccessful.
Solution 1:[1]
To avoid Application hang while using readinto()
function, please try below:
- Try setting
max_concurrency
which increases the speed of blob download like below:
blob_client.download_blob(max_concurrency=10).readinto(download_file)
You can try changing max_concurrency
value to get better results.
- If the above doesn't work, make use of below script if helpful:
blob = BlobClient.from_connection_string(conn_str=conn_str, container_name=container, blob_name=blob)
download_stream = blob.download_blob(max_concurrency=DOWNLOAD_FILE)
with open(target_path, 'Your_Path') as target_file:
download_stream.readinto(target_file)
- Otherwise, try setting
read_timeout="a large number in second"
while initiating BlobClient like below:
.BlobClient(account_url, container_name, blob_name, credential, read_timeout=8000)
You can refer the below link for more information:
VERY slow large blob downloads · Issue #10572 · Azure/azure-sdk-for-python · GitHub
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | RukminiMr-MT |