'Staging xml in snowflake
I am trying to stage xml data from S3 into snowflake. I have successfully created the stage, while querying the data I am getting below given error, upon checking i found some characters are not per UTF-8 encoding in the data.
Error parsing XML: missing first byte in UTF-8 sequence File 'data.xml', line 13583, character 29 Row 300 starts at line 13574, column $1
I tried IGNORE_UTF8_ERRORS = TRUE option in staging, the data is not coming right. we dont want to lose on the data, it seems like the encoding here is “ISO-8859-1”.
Anyone having any suggestion on how to fix this thing.
Solution 1:[1]
If you know that your source data uses a different encoding than UTF-8 then you can specify this in your COPY INTO statement as explained in our docs.
The file format also has a parameter for the encoding, see here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Sergiu |