'LARK: Parsing : Implementing import file?

How would you implement a grammar that can import a file and still parsing it using LARK ?

f.e.:

@import file.txt
.....


Solution 1:[1]

I found a GitHub that seems relevant is this what you are looking for? https://github.com/lark-parser/lark

from lark import Lark
with open('file_to_read.txt', 'r') as file:
    data = file.read().replace('\n', '') #assumes you want to remove \n
l = Lark('''start: WORD "," WORD "!"
            %import common.WORD   // imports from terminal library
            %ignore " "           // Disregard spaces in text
         ''')

print( l.parse("Hello, World!") )
print( l.parse(data) )

If you want to open the file and use it as the lark

from lark import Lark
with open('file_to_read.txt', 'r') as file:
    data = file.read().replace('\n', '') #assumes you want to remove \n
l = Lark(data)

print( l.parse("Hello, World!") )
print( l.parse("your string to parse") )

Solution 2:[2]

the [code at this link][1] will do includes / import in lark. I didn't write this, just passing it on.

it still needs some tweaking for error handling, but it's a good place to start.

below is my slight modifications to it, it actually reads from the files.

import sys

from lark import Lark

from lark.lexer import Lexer, LexerState, LexerThread

class RecursiveLexerThread(LexerThread):

    def __init__(self, lexer: Lexer, lexer_state):
        self.lexer = lexer
        self.state_stack = [lexer_state]

    def lex(self, parser_state):
        while self.state_stack:
            lexer_state = self.state_stack[-1]
            lex = self.lexer.lex(lexer_state, parser_state)
            try:
                token = next(lex)
            except StopIteration:
                self.state_stack.pop()  # We are done with this file
            else:
                if token.type == "_INCLUDE":
                    name = token.value[8:].strip()  # get just the filename
                    self.state_stack.append(LexerState(open(name).read()))
            yield token  # The parser still expects this token either way

grammar = r"""
start: ((_INCLUDE|line)* _EOL)*

line: STRING+
STRING : /\S+/

_INCLUDE.1 : /include\s+\S+/i

_EOL : /(\n+)/

%ignore /[ \t]+/
"""

parser = Lark(grammar, _plugins={
    "LexerThread": RecursiveLexerThread
}, parser="lalr")

tree = parser.parse(open(sys.argv[1]).read())

print(tree.pretty())

https://gist.github.com/MegaIng/c6abba4d9be87473d8d586734f2b39c9

Solution 3:[3]

I just figured out I can use C/C++ preprocessor to generate a file which then I can parse :)

It is not integrated but can make it work

cpp -P included.inc > output.file

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 sten