'using antlr parse message, how to resolve ambiguous literal
I am a newbie with ANTLR and try to parse a World Meteorological Organization (WMO) messages using ANTLR. A message like this: “AVB 12 CVD A12”。This is my grammar:
grammar a;
rule : aaa bbb? ccc ddd;
aaa: char char char ;
bbb: Digit Digit ;
ccc: ('+'|'-')? char 'V' char;
ddd: 'A' Digit Digit;
char : 'A'|'V'| Char;
Char: [A-Z];
Digit: [0-9];
WS: [ \t\n\r=] ->skip;
and it works! But the lexer tokenizes just a single char from the input and I don't know another method. Can anyone suggest a better approach?
Solution 1:[1]
It would clean things up a bit to recognize most of these as tokens.
I don't know the semantics of what you're trying to parse, so I don't know what would be appropriate names. (I'll use L# an p# for Lexer and Parser rules accordingly.
grammar a;
rule : L1 L2? L3 L4;
fragment CHAR: [A-Z];
fragment DIGIT: [0-9];
L3: ('+'|'-')? CHAR 'V' CHAR; // place before L1 for to take precedence for "*V*" chars
L1: CHAR CHAR CHAR ;
L2: DIGIT DIGIT ;
L4: 'A' Digit Digit;
WS: [ \t\n\r=] ->skip;
Once you give the tokens meaningful names, the grammar (and the generated *Context classes will be easier to deal with than if you treat each character as a token.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Mike Cargal |