'Removing specific rows in an Excel file using Azure Data Factory
I have a set of excel files inside ADLS. The format looks similar to the one below:
The first 4 rows would always be the document header information and the last 3 will be 2 empty rows and the end of the document indicator. The number of rows for the employee information is indefinite. I would like to delete the first 4 rows and the last 3 rows using ADF.
Can any help me with what should be expressions in the Derived column / Select?
Solution 1:[1]
Solution 2:[2]
My Excel file:
Source Data set settings (give A5 in range and select first row as header): SourceDataSetProperties
Make sure to refresh schema in the source data set. Schema
After schema refresh, if you preview the source data, you will be seeing all rows from row number 5. This will include footer too which we can filter in data flow.
Next, add a filter transformation with below expression
!startsWith(sno,'dummy') && sno!=''
this will filter out the rows starting with dummy, in your case, end of document. Also we are ignoring the empty rows by checking sno!=''
Final Preview after filter:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ASH |
Solution 2 |