'Regex matched text between tags is too greedy
I am trying to extract text from a string, and have trouble with laziness/greediness.
In the example I want the piece of text to match <b>I want this piece</b>
, so my regex is non-greedy anything between <b>
and </b>
as long as it contains 'piece'.
The problem with my regex that the matched text includes <b>first</b>
.
var text = "<b>first</b> <b>I only want this piece</b>";
var regX = /<b>.*?piece.*?<\/b>/;
var matches = text.match(regX);
Matched text
"<b>first</b> <b>I only want this piece</b>"
Desired match
"<b>I only want this piece</b>"
Solution 1:[1]
Use a negated char class instead of the first .*?
.
var regX = /<b>[^<>]*?piece.*?<\/b>/;
Why?
Because the first <b>.*?piece
will match the first <b>
and it continues until it finds the text piece
and it won't care about the text present in-between. If you use [^<>]*?
, it would do a lazy match of matching any char but not of <
or >
character zero or more times.
Solution 2:[2]
This would work for excluding any html tags, and might be a little more robust, depending on how predictable your string is:
var regX = /<b>(?:(?!<[^>]*>).)*piece.*?<\/b>/
If you want to match newline characters, you can use \s\S in addition to the dot (.), e.g. [.\s\S]
:
var regX = /<b>(?:(?!<[^>]*>)[.\s\S])*piece[.\s\S]*?<\/b>/
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Avinash Raj |
Solution 2 |