'Friendly name for special $end token in Bison

With older versions of Bison, verbose error messages sometimes look like this:

syntax error, unexpected [, expecting $end

Is there a way to give $end a more user-friendly name? In newer versions this is "end of file". Can I get this in older versions as well?

For user-defined tokens I can specify a name as %token NUM "number", but how can I do this for $end?



Solution 1:[1]

If you want to specify an alias for the end-of-input token, you can use:

%token END 0 "friendly name"

That declares that the token id END has code 0, and gives a friendly name to that code.

You could use any otherwise unused symbol instead of END, such as ZERO or FRED. Whatever symbol you use will be #defined to 0, or declared as an enum label with value 0 (depending on bison version), but as long as you don't use the symbol, that fact is not very interesting.

You cannot use YYEOF or EOF, because both of those are reserved (and have other definitions). The first is reserved by Bison itself, as are all symbols starting yy or YY, and the second is reserved by C. If you attempt to redefine a reserved name (whether manually or with a code generator) the result is Undefined Behaviour, whether or not you are presented with a diagnostic and whether or not it appears to work in some context.

The syntax of the %token declaration is a series of triples consisting of a symbol, an optional literal integer, and an optional quoted string. What you want to do as associate an alias (the quoted string) with a token code (the integer), but the syntax doesn't let you do that without specifying some symbol. Explicit token codes are very rarely useful, so that limitation is not particularly onerous. The end-of-input code 0 is probably the only useful explicit code.

Note that %token END YYEOF "foo" (taken from one of the comments) declares two tokens, both with codes selected by Bison. (And one of the symbol names is reserved, which leads to Undefined Behaviour, as noted above.) So that's certainly not what you want. If you're going to supply an alias for code 0, you must have a token declaration with the literal integer 0.

It is documented that 0 is the token code for end of input in the section describing the interaction with the lexer. 0 is how the lexer indicates end of input. That's a fixed aspect of the C interface, and it's safe to assume it will continue to work for the indefinite future. yylex can also return a negative number with the same effect, but that's translated into token code 0. The use of %token END 0 "..." to declare an alias for end of input appears in the example code in the manual, and in any number of real-life parsers (and SO answers).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 rici