'How to loop through folders in Azure Blob Containers

I have the following code which is written in Visual Studio Code. Now I want to run this in Azure Databricks. I have uploaded the entire folder to my Azure Blob Storage. I have named my container Invoices. Now I want to loop through all folders in my blob storage and carry on with my pdf processing steps.

root_path = 'B:\\Invoces\\' 
folders = ['202101','202102','202103','202104','202105','202106','202107','202108','202109','202110','202111','202112']
    files = []
    for folder in folders:
        dir_path = os.path.join(root_path,folder)
        ext = "*.pdf"
        PATH = "."
        for dirpath, dirnames, filenames in os.walk(dir_path):
            files += glob.glob(os.path.join(dirpath, ext))

I am not sure how I can do this from Azure's point of view. What is the equivalent code I can develop and run in Azure Data Bricks to achieve the same result? I want to concatenate the folders with the root_path and get all pdf files.



Solution 1:[1]

To loop through the folders in Azure Blob containers, first we have to mount Azure Blob storage container to Databricks.
We can follow this command to mount:

dbutils.fs.mount(
source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
mount_point = "/mnt/<Mount name>",
extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})

Now, follow this code to loop through the folders in Azure Blob containers:

dbutils.fs.ls("/mnt/<Mount name>/")

SOURCE:
https://docs.databricks.com/_static/notebooks/data-sources/mount-azure-blob-storage.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 RakeshGovindula-MT