'Regex to match words with hyphens and/or apostrophes
I was looking for a regex to match words with hyphens and/or apostrophes. So far, I have:
(\w+([-'])(\w+)?[']?(\w+))
and that works most of the time, though if there's a apostrophe and then a hyphen, like "qu'est-ce", it doesn't match. I could append more optionals, though perhaps there's another more efficient way?
Some examples of what I'm trying to match: Mary's, High-school, 'tis, Chambers', Qu'est-ce.
Solution 1:[1]
use this pattern
(?=\S*['-])([a-zA-Z'-]+)
(?= # Look-Ahead
\S # <not a whitespace character>
* # (zero or more)(greedy)
['-] # Character in ['-] Character Class
) # End of Look-Ahead
( # Capturing Group (1)
[a-zA-Z'-] # Character in [a-zA-Z'-] Character Class
+ # (one or more)(greedy)
) # End of Capturing Group (1)
Solution 2:[2]
[\w'-]+
would match pretty much any occurrence of words with (or without) hyphens and apostrophes, but also in cases where those characters are adjacent.
(?:\w|['-]\w)+
should match cases where the characters can't be adjacent.
If you need to be sure that the word contains hyphens and/or apostrophes and that those characters aren't adjacent maybe try \w*(?:['-](?!['-])\w*)+
. But that would also match ' and - alone.
Solution 3:[3]
debuggex.com is a great resource for visualizing these sorts of things
\b\w*[-']\w*\b
should do the trick
Solution 4:[4]
The problem you're running into is that you actually have three possible sub-patterns: one or more chars, an apostrophe followed by one or more chars, and a hyphen followed by one or more chars.
This presumes you don't wish to accept words that begin or end with apostrophes or hyphens or have hyphens next to apostrophes (or vice versa).
I believe the best way to represent this in a RegExp would be:
/\b[a-z]+(?:['-]?[a-z]+)*\b/
which is described as:
\b # word-break
[a-z]+ # one or more
(?: # start non-matching group
['-]? # zero or one
[a-z]+ # one or more
)* # end of non-matching group, zero or more
\b # word-break
which will match any word that begins and ends with an alpha and can contain zero or more groups of either a apos or a hyphen followed by one or more alpha.
Solution 5:[5]
How about: \'?\w+([-']\w+)*\'?
I suppose these words shouldn't be matched:
something-
or-something
: start or end with-
some--thing
orsome'-thing
:-
not followed by a charactersome''
: two hyphens
Solution 6:[6]
This worked for me:
([a-zA-Z]+'?-?[a-zA-Z]+(-?[a-zA-Z])?)|[a-zA-Z]
Solution 7:[7]
Use
([\w]+[']*[\w]*)|([']*[\w]+)
It will properly parse
"You've and we i've it' '98"
(supports '
in any place in the word but single '
is ignored).
If needed \w
could be replaced with [a-zA-Z]
etc.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | OrderNChaos |
Solution 3 | Patrick |
Solution 4 | Dharman |
Solution 5 | |
Solution 6 | Cryswerton Silva |
Solution 7 | Laurel |