'Golang - extract links using regex

Golang - extract links using regex

I need to get all links from text which are in specific domain example.de using Regex in Go

Below are all possible links that should be extracted:

https://example.de 
https://example.de/
https://example.de/home
https://example.de/home/
https://example.de/home some text that should not be extracted
https://abc.example.de
https://abc.example.de/
https://abc.example.de/home
https://abc.example.de/home
https://abc.example.de/home some text that should not be extracted

What I already tried

I used this website to check if my regex are correct: https://regex101.com/r/ohxUcG/2 and here are combinations that failed:

  • https?://*.+example.de*.+ failed on expression https://abc.example.de/a1b2c3 dsadsa getting whole text to the \n instead of https://abc.example.de/a1b2c3 without dsadsa
  • https?://*.+example.de*.+\s(\w+)$ this gets links that are terminated only with space but sometimes links can be terminated with \n or \t etc.

Resources which may be useful



Solution 1:[1]

You can use

(?:https?://)?(?:[^/.]+\.)*\bexample\.de\b(?:/[^/\s]+)*/?

See the regex demo. Details:

  • (?:https?://)? - an optional http:// or https:// string
  • (?:[^/.]+\.)* - zero or more sequences of one or more chars other than a / and . chars and then a . char
  • \bexample\.de\b - a whole word example.de
  • (?:/[^/\s]+)* - zero or more repetitions of / and then one or more chars other than whitespace and /
  • /? - an optional / char.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew