'azure blob python library - hanging on readinto() function

I have been struggling with an application hang, randomly, I have a very large number of files to download (~100k), after a random number the script will hang, and not exit with any useful errors.

This is my code to download a blob:

        with open(file_name, "wb") as file:
            logger.info("Downloading to " + file_name+" ...")
            blob_client.download_blob().readinto(file)

I tried limiting the script to 1000 files at a time, but after a few runs it will eventually lock again. Without the application hanging or providing any error information, all our attempts to resolve the issue have been unsuccessful.



Solution 1:[1]

To avoid Application hang while using readinto() function, please try below:

  • Try setting max_concurrency which increases the speed of blob download like below:
blob_client.download_blob(max_concurrency=10).readinto(download_file)

You can try changing max_concurrency value to get better results.

  • If the above doesn't work, make use of below script if helpful:
 blob = BlobClient.from_connection_string(conn_str=conn_str, container_name=container, blob_name=blob)
 download_stream = blob.download_blob(max_concurrency=DOWNLOAD_FILE)
      with open(target_path, 'Your_Path') as target_file:
        download_stream.readinto(target_file)
  • Otherwise, try setting read_timeout="a large number in second" while initiating BlobClient like below:
.BlobClient(account_url, container_name, blob_name, credential, read_timeout=8000)

You can refer the below link for more information:

VERY slow large blob downloads · Issue #10572 · Azure/azure-sdk-for-python · GitHub

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 RukminiMr-MT