'RegEx to return 'href' attribute of 'link' tags only?

Im trying to craft a regex that only returns <link> tag hrefs

Why does this regex return all hrefs including <a hrefs?

(?&lt;=&lt;link\s+.*?)href\s*=\s*[\'\"][^\'\"]+
<link rel="stylesheet" rev="stylesheet" href="idlecore-tidied.css?T_2_5_0_228" media="screen">
<a href="anotherurl">Slash Boxes&lt;/a>


Solution 1:[1]

Avoid lookbehind for such simple case, just match what you need, and capture what you want to get.

I got good results with <link\s+[^>]*(href\s*=\s*(['"]).*?\2) in The Regex Coach with s and g options.

Solution 2:[2]

/(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/

i'm a little shaky on the back-references myself, so I left that in there. This regex though:

/(<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/

...works in my Javascript test.

Solution 3:[3]

(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+

works with Expresso (I think Expresso runs on the .NET regex-engine). You could even refine this a bit more to match the closing ' or ":

(?<=<link\s+.*?)href\s*=\s*([\'\"])[^\'\"]+(\1)

Perhaps your regex-engine doesn't work with lookbehind assertions. A workaround would be

(?:<link\s+.*?)(href\s*=\s*([\'\"])[^\'\"]+(\2))

Your match will then be in the captured group 1.

Solution 4:[4]

What regex flavor are you using? Perl, for one, doesn't support variable-length lookbehind. Where that's an option, I'd choose (edited to implement the very good idea from MizardX):

(?<=<link\b[^<>]*?)href\s*=\s*(['"])(?:(?!\1).)+\1

as a first approximation. That way the choice of quote character (' or ") will be matched. The same for a language without support for (variable-length) lookbehind:

(?:<link\b[^<>]*?)(href\s*=\s*(['"])(?:(?!\2).)+\2)

\1 will contain your match.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 PhiLho
Solution 2 nickf
Solution 3 Stefan Gehrig
Solution 4