'Regex matched text between tags is too greedy

I am trying to extract text from a string, and have trouble with laziness/greediness.

In the example I want the piece of text to match I want this piece, so my regex is non-greedy anything between  and  as long as it contains 'piece'.

The problem with my regex that the matched text includes first.

var text = "<b>first</b> <b>I only want this piece</b>";
var regX = /<b>.*?piece.*?<\/b>/;
var matches = text.match(regX);

Matched text

"<b>first</b> <b>I only want this piece</b>"

Desired match

"<b>I only want this piece</b>"

javascript regex

Solution 1:^[1]

Use a negated char class instead of the first .*?.

var regX = /<b>[^<>]*?piece.*?<\/b>/;

Why?

Because the first .*?piece will match the first  and it continues until it finds the text piece and it won't care about the text present in-between. If you use [^<>]*?, it would do a lazy match of matching any char but not of < or > character zero or more times.

Solution 2:^[2]

This would work for excluding any html tags, and might be a little more robust, depending on how predictable your string is:

var regX = /<b>(?:(?!<[^>]*>).)*piece.*?<\/b>/

If you want to match newline characters, you can use \s\S in addition to the dot (.), e.g. [.\s\S]:

var regX = /<b>(?:(?!<[^>]*>)[.\s\S])*piece[.\s\S]*?<\/b>/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Avinash Raj
Solution 2

'Regex matched text between tags is too greedy

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]