'Can EDI files have ~ in data?

I'm parsing a EDI file and splitting by ~s. I am wondering if it's possible for EDI to have ~ in the data itself? Is there a rule that says no ~ in the data? This is for 810/850 etc

edi


Solution 1:[1]

The value defined in the 106th character of the ISA segment (or, alternatively – to be a bit less brittle to whitespace issues – the 1st character after the ISA16 element) is the segment delimiter (in official terms: the segment terminator). Most of the time people specify the ~ character, but other choices are certainly valid.

In this example, the 106th character is ~:

ISA*00* *00* *ZZ*AMAZONDS *01*TESTID *070808*1310*U*00401*000000043*1*T*+~

Instead of counting 106 characters (which, again, can be brittle to whitespace issues), you can count 16 elements – that is, 16 asterisks – to find the value for ISA16 (which is +), and then pick the next character (which is ~).

There are two relevant sections in the official X12 specification (bolded for emphasis):

12.5.4.3 Delimiter Specifications

The delimiters consist of three separators and a terminator. The delimiters are devised for inclusion within the data stream of the transfer. The delimiters are:

  • segment terminator [note: this is the one we're discussing]

  • data element separator

  • component element separator

  • repetition separator

The delimiters are assigned by the interchange sender. These characters are disjoint from those of the data elements; if a character is selected for the data element separator, the component element separator, the repetition separator or the segment terminator from those available for the data elements, that character is no longer available during this interchange for use in a data element. The instance of the terminator (<tr>) must be different from the instance of the data element separator (<gs>), the component element separator (<us>) and the repetition separator (<rs>). The data element separator, component element separator and repetition separator must not have the same character assignment.

So, according to this part of the spec, if the ~ is used as the segment terminator, then the use of the ~ is disallowed in a data element (that is, the textual body).

Now, let's look at section 12.5.A.5 – Recommendations for the Delimiters:

Delimiter characters must be chosen with care, after consideration of data content, limitations of the transmission protocol(s) used, and applicable industry conventions. In the absence of other guidelines, the following recommendations are offered:

<tr> terminator: ~ | Note: the "~" was chosen for its infrequency of use in textual data.

This section is saying that ~ was chosen as the default because ~ is seldom found in textual data (it would have been a bad idea, for example, to use . as the default, since that's such a common inclusion).

That said, even though using the segment terminator is technically prohibited, it's still possible for an EDI transmission to inadvertently include ~ in the textual data – in other words, your trading partner may include this by accident. Further, the BIN and BSD (binary data) segments can certainly include ~ (though these may not apply based on the transaction sets you're working with).

In our parsing API, we apply a set of specific set of patterns based on the type of segment we encounter. For us, it's not sufficient to split naively based on the segment delimiter alone because we may encounter binary segments (BIN, BSD), where it's possible that the segment delimiter character is included in the textual data.

For a regular segment (i.e. not BIN or BSD), the logic is something like this:

  • Consume the segment code (i.e. the characters before the first element delimiter).
  • Consume each element of the segment based on the element delimiter.
  • Stop if the next character is a segment delimiter or a new line.

As an example, for segment BEG*PO-00001**20210901~, the process would look like:

  • Consume BEG. Since this is not a special segment (BIN or BSD), consume elements by splitting on *.
  • Consume PO-00001.
  • Consume ''.
  • Consume 20210901.
  • Stop since next char is ~.

(The pattern for binary segments is different from the pattern we use for regular segments.)

  • Here's an example of how our parser "fails" on a ~ in the textual data when the ISA16 segment delimiter is also ~; the JSON representation is particularly helpful for seeing the issue.
  • Here's an example of our parser succeeding on a ~ in the textual data when the ISA16 segment delimiter is ^.
  • Lastly, here's an example of our parsing succeeding where the ~ is specified in ISA16, but has been omitted altogether in favor of newlines – which we see occasionally.

Hope this helps.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1