'LARK: Parsing : Implementing import file?
How would you implement a grammar that can import a file and still parsing it using LARK ?
f.e.:
@import file.txt
.....
Solution 1:[1]
I found a GitHub that seems relevant is this what you are looking for? https://github.com/lark-parser/lark
from lark import Lark
with open('file_to_read.txt', 'r') as file:
data = file.read().replace('\n', '') #assumes you want to remove \n
l = Lark('''start: WORD "," WORD "!"
%import common.WORD // imports from terminal library
%ignore " " // Disregard spaces in text
''')
print( l.parse("Hello, World!") )
print( l.parse(data) )
If you want to open the file and use it as the lark
from lark import Lark
with open('file_to_read.txt', 'r') as file:
data = file.read().replace('\n', '') #assumes you want to remove \n
l = Lark(data)
print( l.parse("Hello, World!") )
print( l.parse("your string to parse") )
Solution 2:[2]
the [code at this link][1] will do includes / import in lark. I didn't write this, just passing it on.
it still needs some tweaking for error handling, but it's a good place to start.
below is my slight modifications to it, it actually reads from the files.
import sys
from lark import Lark
from lark.lexer import Lexer, LexerState, LexerThread
class RecursiveLexerThread(LexerThread):
def __init__(self, lexer: Lexer, lexer_state):
self.lexer = lexer
self.state_stack = [lexer_state]
def lex(self, parser_state):
while self.state_stack:
lexer_state = self.state_stack[-1]
lex = self.lexer.lex(lexer_state, parser_state)
try:
token = next(lex)
except StopIteration:
self.state_stack.pop() # We are done with this file
else:
if token.type == "_INCLUDE":
name = token.value[8:].strip() # get just the filename
self.state_stack.append(LexerState(open(name).read()))
yield token # The parser still expects this token either way
grammar = r"""
start: ((_INCLUDE|line)* _EOL)*
line: STRING+
STRING : /\S+/
_INCLUDE.1 : /include\s+\S+/i
_EOL : /(\n+)/
%ignore /[ \t]+/
"""
parser = Lark(grammar, _plugins={
"LexerThread": RecursiveLexerThread
}, parser="lalr")
tree = parser.parse(open(sys.argv[1]).read())
print(tree.pretty())
https://gist.github.com/MegaIng/c6abba4d9be87473d8d586734f2b39c9
Solution 3:[3]
I just figured out I can use C/C++ preprocessor to generate a file which then I can parse :)
It is not integrated but can make it work
cpp -P included.inc > output.file
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | sten |