'Extract URL until whitespace or <br> tag using Regex with Javascript

I have a string like:

Webcam recording https://www.example.com/?id=456&code=123

or like:

Webcam recording https://www.example.com/?id=456&code=123<br><b>test<b>

To extract the URL from the first example I used: var reg_exUrl = /\bhttps?:\/\/[^ ]+/g;

Now I tried to extend the Regex so it takes the first match until whitespace (end of line) or <br> tag.

This was my attempt:

var reg_exUrl = /\b(https?:\/\/[^ ]+)(\<br\>)/g;

Which looks good on https://regex101.com/r/gudNab/1 and shows up as two different matches.

But using the Regex in Javascript, the <br> tag gets always included in the link.

Using var matches = line.match(reg_exUrl); gives me with matches[0]:

https://www.example.com/?id=456&code=123<br>

instead of the desired https://www.example.com/?id=456&code=123



Solution 1:[1]

If you want to select text before the <br> you can use a postive lookahead. https?:\/\/.*?(?=<br>)

Adding in a $ and \n for an early end of input: https?:\/\/.*?(?=<br>|$|\n)

const regexp = /https?:\/\/.*?(?=<br>|$|\n)/;
const testString = "Webcam-Aufnahme https://www.example.com/file?id=959559110184937375.mp4&code=4yrn1ev<br>**test**";

console.log(testString.match(regexp)[0])

See on regex101

Solution 2:[2]

You get the full match as you are using matches[0] but you have 2 capture groups where the part without the <br> is in capture group 1.

You can get that group value using match if you remove the global /g flag.

var line = "Webcam recording https://www.example.com/?id=456&code=123<br><b>test<b>\n";
var reg_exUrl = /\b(https?:\/\/[^ ]+)(\<br\>)/;
var matches = line.match(reg_exUrl);
console.log(matches[1]);

If you want both examples to match, you can use a pattern without a non greedy quantifier by using a negated character class that matches any char except <, and only matches it if it is not directly followed by br>

The pattern matches:

  • \bhttps?:\/\/
  • [^\s<]* Optionally match any char except a whitespace char or <
  • (?: Non capture group
    • <(?!br>) Match < if not directly followed by br>
    • [^\s<]* Optionally match any char except a whitespace char or <
  • )* Close non capture group and optionally repeat

const regex = /\bhttps?:\/\/[^\s<]*(?:<(?!br>)[^\s<]*)*/;
[
  "Webcam-Aufnahme https://www.example.com/file?id=959559110184937375.mp4&code=4yrn1ev<br><b>test<b><br>",
  "Webcam-Aufnahme https://www.example.com/file?id=959559110184937375.mp4&code=4yrn1ev"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(m[0]);
  }
});

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Avatar
Solution 2 Avatar