'RegEx to return 'href' attribute of 'link' tags only?
Im trying to craft a regex that only returns <link>
tag hrefs
Why does this regex return all hrefs including <a hrefs?
(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+
<link rel="stylesheet" rev="stylesheet" href="idlecore-tidied.css?T_2_5_0_228" media="screen">
<a href="anotherurl">Slash Boxes</a>
Solution 1:[1]
Avoid lookbehind for such simple case, just match what you need, and capture what you want to get.
I got good results with <link\s+[^>]*(href\s*=\s*(['"]).*?\2)
in The Regex Coach with s and g options.
Solution 2:[2]
/(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/
i'm a little shaky on the back-references myself, so I left that in there. This regex though:
/(<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/
...works in my Javascript test.
Solution 3:[3]
(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+
works with Expresso (I think Expresso runs on the .NET regex-engine). You could even refine this a bit more to match the closing '
or
"
:
(?<=<link\s+.*?)href\s*=\s*([\'\"])[^\'\"]+(\1)
Perhaps your regex-engine doesn't work with lookbehind assertions. A workaround would be
(?:<link\s+.*?)(href\s*=\s*([\'\"])[^\'\"]+(\2))
Your match will then be in the captured group 1.
Solution 4:[4]
What regex flavor are you using? Perl, for one, doesn't support variable-length lookbehind. Where that's an option, I'd choose (edited to implement the very good idea from MizardX):
(?<=<link\b[^<>]*?)href\s*=\s*(['"])(?:(?!\1).)+\1
as a first approximation. That way the choice of quote character (' or ") will be matched. The same for a language without support for (variable-length) lookbehind:
(?:<link\b[^<>]*?)(href\s*=\s*(['"])(?:(?!\2).)+\2)
\1 will contain your match.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | PhiLho |
Solution 2 | nickf |
Solution 3 | Stefan Gehrig |
Solution 4 |