'XML data at level 3/4 getting NULLs

I got my source data XML into snowflake stage tables. It's quite a complex XML document data for each record in the table. I got some of the elements using flatten but innermost level 3-4 nested XML returned as null(with no data). Can someone throw help?

Here is a sample of my XML data structure at each level

<root>
    <level1>
        <level2>value2x</level2>
        <level2>value2y</level2>
        <level2>value2z</level2>
        <level2>
            <level3>
                <level4>value4X</level4>
                <level4>value4Y</level4>
            </level3>
        </level2>
    </level1>
</root>

I used to flatten to query level1, level2, and level 3 as separate transformed 3 new columns.

However, when I try to flatten the new column which has levels 3 and 4, I get Nulls.

What am I missing here? Because of this one issue, I have to explore other options like creating an azure function to call as snowflake external functions to convert them, etc, which is making this transformation process much more complicated.



Solution 1:[1]

You are getting burned by XML parsers errors, that happen on all platforms. Because you Level3 is a single object in you Level2 object, the parse treats Level3 as an single object, so if you flatten on that (assuming it's an array) then because it treats it like an object, not an array, the thing you get with be the members of Level3 not a row per level3.

The work around is to cast the Level2 to an array, always, so the level3 values are looped over as a whole.

So with this CTE for data:

SELECT parse_xml('<root>
    <level1>
        <level2>value2x</level2>
        <level2>value2y</level2>
        <level2>value2z</level2>
        <level2>
            <level3>
                <level4>value4X</level4>
                <level4>value4Y</level4>
            </level3>
        </level2>
    </level1>
</root>') as xml
)

this code:

SELECT 
    iff(not is_object(l1.value:"$"), l1.value:"$"::text, null) as level2_val
    ,iff(is_object(l1.value:"$"), l3.value:"$"::text, null) as level4_val
FROM data as d
    ,LATERAL FLATTEN(to_array(XMLGET(d.xml, 'level1'):"$")) l1
    ,LATERAL FLATTEN(to_array(l1.value:"$")) l2
    ,LATERAL FLATTEN(to_array(l2.value:"$"), OUTER => true) l3
;

gives:

LEVEL2_VAL LEVEL4_VAL
value2x null
value2y null
value2z null
null value4X
null value4Y

Solution 2:[2]

I have tried the same thing using Snowpark Python and written a function that takes an XML document and returns a JSON doc, which worked in this situation, thereby solving the problem. Thanks all.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 KSKumar