'Understanding why ANTLR4 failed to predict the tokens I think it should
I started learning ANTLR4 and language grammars in general. I choose to play with TSQL grammar. I had no problems before I started to try get an expected tokens with an incomplete inputs.
I would like to understand why ANTLR4 does not show me expected tokens I think it should. I understand that I failed somewhere, but can't understand where...
I have a lexer FLexer.g4
:
lexer grammar FLexer;
UPDATE: 'UPDATE';
STATISTICS: 'STATISTICS';
SET: 'SET';
ID: [A-Za-z]+;
EQUAL: '=';
COMMA: ',';
DOT: '.';
and a grammar FParser.g4
:
parser grammar FParser;
options { tokenVocab=FLexer; }
sql_clauses
:
update_statement
|
update_statistics
;
update_statement
:
UPDATE
full_table_name
SET ID EQUAL ID
;
update_statistics
: UPDATE STATISTICS full_table_name
;
full_table_name
: ID
;
My csproj:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net6.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="ANTLR4.Runtime.Standard" Version="4.9.3" />
</ItemGroup>
</Project>
My C# code:
using System.Text;
using Antlr4.Runtime;
using Antlr4.Runtime.Atn;
using Antlr4.Runtime.Misc;
using Antlr4.Runtime.Tree;
using IErrorNode = Antlr4.Runtime.Tree.IErrorNode;
using ITerminalNode = Antlr4.Runtime.Tree.ITerminalNode;
using IToken = Antlr4.Runtime.IToken;
using ParserRuleContext = Antlr4.Runtime.ParserRuleContext;
try
{
var text = "update";
var inputStream = CharStreams.fromString(text.ToString());
var upperStream = new CaseChangingCharStream(inputStream, true);
var lexer = new FLexer(upperStream);
var commonTokenStream = new CommonTokenStream(lexer);
var parser = new FParser(commonTokenStream);
var errorListener = new TSqlErrorListener();
parser.AddErrorListener(errorListener);
//parser.BuildParseTree = true;
var tree = parser.sql_clauses();
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex);
}
public class TSqlErrorListener : BaseErrorListener
{
public TSqlErrorListener()
{
}
public override void SyntaxError(TextWriter output, IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
{
var parser = (Parser)recognizer;
var expecting = parser.GetExpectedTokens();
var qq = expecting.ToString(recognizer.Vocabulary);
Console.WriteLine("expect --> " + qq);
}
}
public class CaseChangingCharStream : ICharStream
{
private ICharStream stream;
private bool upper;
public CaseChangingCharStream(ICharStream stream, bool upper)
{
this.stream = stream;
this.upper = upper;
}
public int Index
{
get
{
return stream.Index;
}
}
public int Size
{
get
{
return stream.Size;
}
}
public string SourceName
{
get
{
return stream.SourceName;
}
}
public void Consume()
{
stream.Consume();
}
[return: NotNull]
public string GetText(Interval interval)
{
return stream.GetText(interval);
}
public int LA(int i)
{
int c = stream.LA(i);
if (c <= 0)
{
return c;
}
char o = (char)c;
if (upper)
{
return (int)char.ToUpperInvariant(o);
}
return (int)char.ToLowerInvariant(o);
}
public int Mark()
{
return stream.Mark();
}
public void Release(int marker)
{
stream.Release(marker);
}
public void Seek(int index)
{
stream.Seek(index);
}
}
I have an output that confuses me:
line 1:6 no viable alternative at input 'update'
expect --> 'UPDATE'
The first line is a standard output of ANTLR4 , the second - expected tokens ANTLR4 returns.
First line is also confuses me, what does it mean?
But my main question is why does ANTLR4 suggest me UPDATE
token if I already have it? I expected that I get 'STATISTICS' and 'ID'.
Also, how can I fix this behaviour?
Thank you.
Solution 1:[1]
Although I noted in the above comments that one can use Code Completion Core to get a list of tokens that the parser is looking for, I have found a better way.
It turns out Antlr4 provides in a NoViableAltException object everything you need to determine what the parser is expecting at the point of failure. This is passed to you in SyntaxError() which you just have to test and extract. The object keeps a list of "dead-end states", which encode the state in each "ATN" that it failed.
In this example, the parser was expecting an ID
for a sql_clauses
and and a STATISTICS
for a update_statement
rule.
The following code should work (although I have not tested it thoroughly).
public override void SyntaxError(TextWriter output, IRecognizer recognizer, S offendingSymbol, int line, int col, string msg, RecognitionException e)
{
base.SyntaxError(output, recognizer, offendingSymbol, line, col, msg, e);
var parser = recognizer as Parser;
if (e is NoViableAltException noviable)
{
System.Console.Write("Expecting: ");
foreach (var dead_end in noviable.DeadEndConfigs.GetStates())
{
var state_number = dead_end.stateNumber;
var state = parser.Atn.states[state_number];
foreach (Transition t in state.TransitionsArray)
{
switch (t.TransitionType)
{
case TransitionType.RULE: break;
case TransitionType.PREDICATE: break;
case TransitionType.WILDCARD: break;
default:
if (!t.IsEpsilon)
{
IntervalSet x = t.Label;
System.Console.Write(" " + x.ToString(recognizer.Vocabulary));
}
break;
}
}
}
System.Console.WriteLine();
}
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | kaby76 |