'Removing specific rows in an Excel file using Azure Data Factory

I have a set of excel files inside ADLS. The format looks similar to the one below:

enter image description here

The first 4 rows would always be the document header information and the last 3 will be 2 empty rows and the end of the document indicator. The number of rows for the employee information is indefinite. I would like to delete the first 4 rows and the last 3 rows using ADF.

Can any help me with what should be expressions in the Derived column / Select?



Solution 1:[1]

How about this? Under the 'Source' tab, choose the number of lines you want to skip.

enter image description here

Solution 2:[2]

My Excel file:

ExcelFile

Source Data set settings (give A5 in range and select first row as header): SourceDataSetProperties

Make sure to refresh schema in the source data set. Schema

After schema refresh, if you preview the source data, you will be seeing all rows from row number 5. This will include footer too which we can filter in data flow.

Preview Next, add a filter transformation with below expression

  !startsWith(sno,'dummy') && sno!=''

this will filter out the rows starting with dummy, in your case, end of document. Also we are ignoring the empty rows by checking sno!=''

Final Preview after filter:

Preview After Filtering footer

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ASH
Solution 2