'Regex: capturing capital word with nothing in front of it
I'm trying to match all proper nouns in some given text.
So far I've got (?<![.?!]\s|^)(?<!\“)[A-Z][a-z]+
which ignores capital words preceded by a .?!
and a space as well as words inside a bracket. Can be seen here.
But it doesn't catch capital words at the beginning of sentences. So given the text:
Alec, Prince, so Genoa and Lucca are now just family estates of the “What”. He said no. He, being the Prince.
It successfully catches Prince, Genoa, Lucca but not Alec.
So i'd like some help to modify it if possible, to match any capital word with nothing behind it. (I'm not sure how to define nothing)
Solution 1:[1]
You can put the “
as the second alternative in the lookbehind instead of ^
which asserts the start of the string.
Then you can omit (?<!\“)
(?<![.?!]\s|“)[A-Z][a-z]+
Explanation
(?<!
Negative lookbehind, assert what is directly to the left if the current position is not[.?!]\s
Match any of.
?
!
followed by a whitespace char|
Or“
Match literally
)
Close lookbehind[A-Z][a-z]+
Match an uppercase char A-Z and 1+ chars a-z
See a regex demo.
Solution 2:[2]
The thing you're looking for is called a "word boundary", which is denoted as \b
in a lot of regex languages.
Try \b[A-Z][a-z]*\b
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | The fourth bird |
Solution 2 | uckelman |