'How to loop through folders in Azure Blob Containers
I have the following code which is written in Visual Studio Code. Now I want to run this in Azure Databricks. I have uploaded the entire folder to my Azure Blob Storage. I have named my container Invoices
. Now I want to loop through all folders in my blob storage and carry on with my pdf processing steps.
root_path = 'B:\\Invoces\\'
folders = ['202101','202102','202103','202104','202105','202106','202107','202108','202109','202110','202111','202112']
files = []
for folder in folders:
dir_path = os.path.join(root_path,folder)
ext = "*.pdf"
PATH = "."
for dirpath, dirnames, filenames in os.walk(dir_path):
files += glob.glob(os.path.join(dirpath, ext))
I am not sure how I can do this from Azure's point of view. What is the equivalent code I can develop and run in Azure Data Bricks to achieve the same result? I want to concatenate the folders with the root_path
and get all pdf files.
Solution 1:[1]
To loop through the folders in Azure Blob containers, first we have to mount Azure Blob storage container to Databricks.
We can follow this command to mount:
dbutils.fs.mount(
source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
mount_point = "/mnt/<Mount name>",
extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
Now, follow this code to loop through the folders in Azure Blob containers:
dbutils.fs.ls("/mnt/<Mount name>/")
SOURCE:
https://docs.databricks.com/_static/notebooks/data-sources/mount-azure-blob-storage.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | RakeshGovindula-MT |