'Golang - extract links using regex
Golang - extract links using regex
I need to get all links from text which are in specific domain example.de using Regex in Go
Below are all possible links that should be extracted:
https://example.de
https://example.de/
https://example.de/home
https://example.de/home/
https://example.de/home some text that should not be extracted
https://abc.example.de
https://abc.example.de/
https://abc.example.de/home
https://abc.example.de/home
https://abc.example.de/home some text that should not be extracted
What I already tried
I used this website to check if my regex are correct: https://regex101.com/r/ohxUcG/2 and here are combinations that failed:
https?://*.+example.de*.+failed on expressionhttps://abc.example.de/a1b2c3 dsadsagetting whole text to the\ninstead ofhttps://abc.example.de/a1b2c3withoutdsadsahttps?://*.+example.de*.+\s(\w+)$this gets links that are terminated only with space but sometimes links can be terminated with\nor\tetc.
Resources which may be useful
Solution 1:[1]
You can use
(?:https?://)?(?:[^/.]+\.)*\bexample\.de\b(?:/[^/\s]+)*/?
See the regex demo. Details:
(?:https?://)?- an optionalhttp://orhttps://string(?:[^/.]+\.)*- zero or more sequences of one or more chars other than a/and.chars and then a.char\bexample\.de\b- a whole wordexample.de(?:/[^/\s]+)*- zero or more repetitions of/and then one or more chars other than whitespace and//?- an optional/char.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
