'Removing specific rows in an Excel file using Azure Data Factory

I have a set of excel files inside ADLS. The format looks similar to the one below:

The first 4 rows would always be the document header information and the last 3 will be 2 empty rows and the end of the document indicator. The number of rows for the employee information is indefinite. I would like to delete the first 4 rows and the last 3 rows using ADF.

Can any help me with what should be expressions in the Derived column / Select?

azure azure-data-factory

Solution 1:^[1]

How about this? Under the 'Source' tab, choose the number of lines you want to skip.

Solution 2:^[2]

My Excel file:

Source Data set settings (give A5 in range and select first row as header): SourceDataSetProperties

Make sure to refresh schema in the source data set. Schema

After schema refresh, if you preview the source data, you will be seeing all rows from row number 5. This will include footer too which we can filter in data flow.

Next, add a filter transformation with below expression

  !startsWith(sno,'dummy') && sno!=''

this will filter out the rows starting with dummy, in your case, end of document. Also we are ignoring the empty rows by checking sno!=''

Final Preview after filter:

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	ASH
Solution 2

'Removing specific rows in an Excel file using Azure Data Factory

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]