'Staging xml in snowflake

I am trying to stage xml data from S3 into snowflake. I have successfully created the stage, while querying the data I am getting below given error, upon checking i found some characters are not per UTF-8 encoding in the data.

Error parsing XML: missing first byte in UTF-8 sequence File 'data.xml', line 13583, character 29 Row 300 starts at line 13574, column $1

I tried IGNORE_UTF8_ERRORS = TRUE option in staging, the data is not coming right. we dont want to lose on the data, it seems like the encoding here is “ISO-8859-1”.

Anyone having any suggestion on how to fix this thing.



Solution 1:[1]

If you know that your source data uses a different encoding than UTF-8 then you can specify this in your COPY INTO statement as explained in our docs.

enter image description here

The file format also has a parameter for the encoding, see here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Sergiu