'using antlr parse message, how to resolve ambiguous literal

I am a newbie with ANTLR and try to parse a World Meteorological Organization (WMO) messages using ANTLR. A message like this: “AVB 12 CVD A12”。This is my grammar:

grammar a;

rule : aaa bbb? ccc ddd;
aaa: char char char ;
bbb: Digit Digit ;
ccc: ('+'|'-')? char 'V' char;
ddd: 'A' Digit Digit;
char : 'A'|'V'| Char;

Char: [A-Z];
Digit: [0-9];
WS: [ \t\n\r=] ->skip;

and it works! But the lexer tokenizes just a single char from the input and I don't know another method. Can anyone suggest a better approach?



Solution 1:[1]

It would clean things up a bit to recognize most of these as tokens.

I don't know the semantics of what you're trying to parse, so I don't know what would be appropriate names. (I'll use L# an p# for Lexer and Parser rules accordingly.

grammar a;

rule : L1 L2? L3 L4;

fragment CHAR: [A-Z];
fragment DIGIT: [0-9];

L3: ('+'|'-')? CHAR 'V' CHAR; // place before L1 for to take precedence for "*V*" chars 
L1: CHAR CHAR CHAR ;
L2: DIGIT DIGIT ;
L4: 'A' Digit Digit;

WS: [ \t\n\r=] ->skip;

Once you give the tokens meaningful names, the grammar (and the generated *Context classes will be easier to deal with than if you treat each character as a token.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mike Cargal