'python regex to find multiline C comment spanning multiple lines
I m trying to get a regex which will work on multi-line C comments. Managed to make it work for /* comments here */ but does not work if the comment goes to the next line. How do I make a regex which spans over multiple lines?
Using this as my input:
/* this comment
must be recognized */
The problem I get is "must, be and recognized" is matched as ID's and */ as illegal characters.
#!/usr/bin/python
import ply.lex as lex
tokens = ['ID', 'COMMENT']
t_ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
def t_COMMENT(t):
r'(?s)/\*(.*?).?(\*/)'
#r'(?s)/\*(.*?).?(\*/)' does not work either.
return t
# Error handling rule
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
lex.lex() #Build the lexer
lex.input('/* this comment\r\n must be recognised */\r\n')
while True:
tok = lex.token()
if not tok:break
if tok.type == 'COMMENT':
print tok.type
I tried quite a few: Create array of regex match(multiline) and How to handle multiple rules for one token with PLY and few other things available at http://www.dabeaz.com/ply/ply.html
Solution 1:[1]
I use this regex when I want to find multi line comments in C:
If I want to include the '/* */' chars:
\/\*(\*(?!\/)|[^*])*\*\/
If I don't want to include it:
(?<=\*)[\n]*.*[\n]*.*[\n]*[\n]*?[\n]*(?=\*)
Solution 2:[2]
By default, in the regex used by the PLY lexer, the dot .
does not math a new line \n
.
So if you really want to math any character, use (.|\n)
instead of .
(I had the same problem, and your comment on your own question helped me so I just create an answer for the newcomers)
Solution 3:[3]
def t_COMMENT(t):
r'(?s)/\*.*?\*/'
return t
As described here:
(?s)
is a modifier that makes.
also match new line feeds.*?
is the non-greedy version of.*
. It that matches the shortest possible sequence of characters (before a\*/
that comes next)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Shoosha |
Solution 2 | Q-B |
Solution 3 |