'Extract URL until whitespace or <br> tag using Regex with Javascript
I have a string like:
Webcam recording https://www.example.com/?id=456&code=123
or like:
Webcam recording https://www.example.com/?id=456&code=123<br><b>test<b>
To extract the URL from the first example I used: var reg_exUrl = /\bhttps?:\/\/[^ ]+/g;
Now I tried to extend the Regex so it takes the first match until whitespace (end of line) or <br>
tag.
This was my attempt:
var reg_exUrl = /\b(https?:\/\/[^ ]+)(\<br\>)/g;
Which looks good on https://regex101.com/r/gudNab/1 and shows up as two different matches.
But using the Regex in Javascript, the <br>
tag gets always included in the link.
Using var matches = line.match(reg_exUrl);
gives me with matches[0]
:
https://www.example.com/?id=456&code=123<br>
instead of the desired https://www.example.com/?id=456&code=123
Solution 1:[1]
If you want to select text before the <br>
you can use a postive lookahead.
https?:\/\/.*?(?=<br>)
Adding in a $
and \n
for an early end of input: https?:\/\/.*?(?=<br>|$|\n)
const regexp = /https?:\/\/.*?(?=<br>|$|\n)/;
const testString = "Webcam-Aufnahme https://www.example.com/file?id=959559110184937375.mp4&code=4yrn1ev<br>**test**";
console.log(testString.match(regexp)[0])
See on regex101
Solution 2:[2]
You get the full match as you are using matches[0]
but you have 2 capture groups where the part without the <br>
is in capture group 1.
You can get that group value using match if you remove the global /g
flag.
var line = "Webcam recording https://www.example.com/?id=456&code=123<br><b>test<b>\n";
var reg_exUrl = /\b(https?:\/\/[^ ]+)(\<br\>)/;
var matches = line.match(reg_exUrl);
console.log(matches[1]);
If you want both examples to match, you can use a pattern without a non greedy quantifier by using a negated character class that matches any char except <
, and only matches it if it is not directly followed by br>
The pattern matches:
\bhttps?:\/\/
[^\s<]*
Optionally match any char except a whitespace char or<
(?:
Non capture group<(?!br>)
Match<
if not directly followed bybr>
[^\s<]*
Optionally match any char except a whitespace char or<
)*
Close non capture group and optionally repeat
const regex = /\bhttps?:\/\/[^\s<]*(?:<(?!br>)[^\s<]*)*/;
[
"Webcam-Aufnahme https://www.example.com/file?id=959559110184937375.mp4&code=4yrn1ev<br><b>test<b><br>",
"Webcam-Aufnahme https://www.example.com/file?id=959559110184937375.mp4&code=4yrn1ev"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[0]);
}
});
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Avatar |
Solution 2 | Avatar |