'Golang - extract links using regex
Golang - extract links using regex
I need to get all links from text which are in specific domain example.de
using Regex in Go
Below are all possible links that should be extracted:
https://example.de
https://example.de/
https://example.de/home
https://example.de/home/
https://example.de/home some text that should not be extracted
https://abc.example.de
https://abc.example.de/
https://abc.example.de/home
https://abc.example.de/home
https://abc.example.de/home some text that should not be extracted
What I already tried
I used this website to check if my regex are correct: https://regex101.com/r/ohxUcG/2 and here are combinations that failed:
https?://*.+example.de*.+
failed on expressionhttps://abc.example.de/a1b2c3 dsadsa
getting whole text to the\n
instead ofhttps://abc.example.de/a1b2c3
withoutdsadsa
https?://*.+example.de*.+\s(\w+)$
this gets links that are terminated only with space but sometimes links can be terminated with\n
or\t
etc.
Resources which may be useful
Solution 1:[1]
You can use
(?:https?://)?(?:[^/.]+\.)*\bexample\.de\b(?:/[^/\s]+)*/?
See the regex demo. Details:
(?:https?://)?
- an optionalhttp://
orhttps://
string(?:[^/.]+\.)*
- zero or more sequences of one or more chars other than a/
and.
chars and then a.
char\bexample\.de\b
- a whole wordexample.de
(?:/[^/\s]+)*
- zero or more repetitions of/
and then one or more chars other than whitespace and/
/?
- an optional/
char.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Wiktor Stribiżew |