'Escape control characters in XML 1.0

I understand why control characters are illegal in XML 1.0, but still I need to store them somehow in XML payload and I cannot find any recommendations about escaping them. I cannot upgrade to XML 1.1.

How should I escape e.g. SOH character (\u0001 - standard separator for FIX messages)?

The following doesn't work:

<data>&#x01;</data>


Solution 1:[1]

One way is to use processing instructions: <?hex 01?>. But that only works in element content, not in attributes. And of course the processing instruction needs to be understood by the receiving application.

You could also use elements: <hex value="01"/> but elements are visible in an XSD schema or DTD, while processing instructions are hidden.

Another approach is that if a piece of payload can contain such characters, then put the whole payload in Base64 encoding.

Solution 2:[2]

It's quite common in logging/printing of FIX messages to substitute SOH with another character like '|'. Could you do the same here?

Solution 3:[3]

My company ended up adding our own markup before XML: {1}. You also have to escape the { and } braces as {123} and {125}. The when reading the XML you have to do your own parse of the embedded codes.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Michael Kay
Solution 2 Andy Lynch
Solution 3 Jeff Brophy