'Apache Camel Split by start and end characters SOH and ETX

I have an spring boot application which have routes.xml being loaded on startup

On the routes.xml, i have a MQ queue source that contains sample message

SOH{123}{345}{4
5
6
}ETXSOH{111}{222}{3
3
3
}ETX

where SOH = \u0001 and ETX = \u0003

When i receive this message, i want to split the message to two

{123}{345}{4
5
6
}

and

{111}{222}{3
3
3
}

Currently i am trying to split using

<split>
  <tokenize token="(?s)(?&lt;=\u0001)(.*?)(?=\u0003)" regex="true"/>
  <to uri="jms:queue:TEST.OUT.Q" />
</split>

I have tested this regex using online regex tester and it was matching. https://regex101.com/r/fU5VVj/1

But when runnning the code what i am geting is #1

SOH

#2

ETXSOH

#3

ETX

Also tried the token and endToken but not working for my case

<tokenize token="\u0001" endToken="\u0003" />

Is my case possible using camel route xml? If yes, can you point me to correct regex or start and end token.

Thanks



Solution 1:[1]

Seems camel regex is different with java regex, just created a new process using sample code below

    Pattern p = Pattern.compile("(?s)(?<=\\u0001).*?(?=\\u0003)");
    Matcher m = p.matcher(items);
    List<String> tokens = new LinkedList<>();

    while (m.find()) {
        String token = m.group();
        System.out.println("item = "+token);
        tokens.add(token);
    }

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 dranoeL