'How to use Wildcard Filenames in Azure Data Factory SFTP?

I am using Data Factory V2 and have a dataset created that is located in a third-party SFTP. The SFTP uses a SSH key and password. I was successful with creating the connection to the SFTP with the key and password. I can now browse the SFTP within Data Factory, see the only folder on the service and see all the TSV files in that folder.

Naturally, Azure Data Factory asked for the location of the file(s) to import. I use the "Browse" option to select the folder I need, but not the files. I want to use a wildcard for the files.

When I opt to do a *.tsv option after the folder, I get errors on previewing the data. When I go back and specify the file name, I can preview the data. So, I know Azure can connect, read, and preview the data if I don't use a wildcard.

Looking over the documentation from Azure, I see they recommend not specifying the folder or the wildcard in the dataset properties. I skip over that and move right to a new pipeline. Using Copy, I set the copy activity to use the SFTP dataset, specify the wildcard folder name "MyFolder*" and wildcard file name like in the documentation as "*.tsv".

I get errors saying I need to specify the folder and wild card in the dataset when I publish. Thus, I go back to the dataset, specify the folder and *.tsv as the wildcard.

In all cases: this is the error I receive when previewing the data in the pipeline or in the dataset.

Can't find SFTP path '/MyFolder/*.tsv'. Please check if the path exists. If the path you configured does not start with '/', note it is a relative path under the given user's default folder ''. No such file .

Why is this that complicated? What am I missing here? The dataset can connect and see individual files as:

/MyFolder/MyFile_20200104.tsv

But fails when you set it up as

/MyFolder/*.tsv

enter image description here



Solution 1:[1]

I use Copy frequently to pull data from SFTP sources. You mentioned in your question that the documentation says to NOT specify the wildcards in the DataSet, but your example does just that. Instead, you should specify them in the Copy Activity Source settings.

In my implementations, the DataSet has no parameters and no values specified in the Directory and File boxes: enter image description here

In the Copy activity's Source tab, I specify the wildcard values. Those can be text, parameters, variables, or expressions. I've highlighted the options I use most frequently below. enter image description here

Solution 2:[2]

You can specify till the base folder here and then on the Source Tab select Wildcard Path specify the subfolder in first block (if there as in some activity like delete its not present) and *.tsv in the second block.

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Joel Cochran
Solution 2 Nilanshu Twinkle