'Azure Data Factory (ADF) - Parse/Flatten XML file - Get content of all elements match wildcard criteria and sitting in different segments in Hierarchy
I have XML file something like below and need to get all complex elements having different names but all ends with "_KEYS" and they are part of different segments in XML. The sample below has only 3 such elements.. but actual file has in hundreds. How to get it done in ADF
<XMLInput>
<SegmentX>
.
.
<Category_KEYS ID= 'AAAAAAA'> AAAA DESCRIPTION </Category_KEYS>
.
.
</SegmentX>
<SegmentY>
.
.
.
<Staus_KEYS ID= 'BBBBBBB'> BBB DESCRIPTION> </Status_KEYS>
.
.
.
</SegmentY>
<SegmentZ>
.
<Department_KEYS ID= 'CCCCCC'> CCCC DESCRIPTION </Department_KEYS>
.
</SegmentZ>
</XMLInput>
In fact looking for all ID's and corresponding descriptions.
**ID | VALUE**
AAAAAAA | AAAA DESCRIPTION
BBBBBBB | BBB DESCRIPTION
CCCCCC | CCCC DESCRIPTION
Solution 1:[1]
If you have a SQL Server or Azure SQL DB then they are both quite capable with XML, using the .nodes
and .value
methods of the xml datatype eg
DECLARE @xml XML = '<XMLInput>
<SegmentX>
<Category_KEYS ID= "AAAAAAA"> AAAA DESCRIPTION </Category_KEYS>
</SegmentX>
<SegmentY>
<Status_KEYS ID= "BBBBBBB"> BBB DESCRIPTION </Status_KEYS>
</SegmentY>
<SegmentZ>
<Department_KEYS ID= "CCCCCC"> CCCC DESCRIPTION </Department_KEYS>
</SegmentZ>
</XMLInput>';
SELECT
x.c.value('(*/@ID)[1]', 'VARCHAR(20)') id,
x.c.value('(*)[1]', 'VARCHAR(20)') [description]
FROM @xml.nodes('XMLInput/*') x(c);
My results:
It is possible to read XML files with Lookup activities but might be awkward to get the table-type result you are after, unless you use a For Each activity which has a limit of 5,000 loops and would be a bad idea where you have many elements to loop through. I would pass this off to some compute to handle, whether it be Azure SQL DB like in my example, a Databricks or Synapse notebook or Mapping Data Flows if you need a low-code experience.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |