'How do i write a RegEx that starts reading from behind?
I have a series of words I try to capture.
I have the following problem:
- The string ends with a fixed set of words
- It is not clearly defined how many words the string consists of. However, it should capture all words that start with a upper case letter (German language). Therefore, the left anchor should be the first word starting with lower case.
Example (bold is what I try to capture):
I like Apple Bananas And Cars.
building houses Might Be Salty + Hard said Jessica.
This is the RegEx I tried so far, it only works, if the "non-capture" string does not include any upper case words:
/(?:[a-zäöü]*)([\p{L} +().&]+[Cars|Hard])/gu
Solution 1:[1]
Use \p{Lu}
for uppercase letters:
(?:[\p{Lu}+()&][\p{L}+()&]* )+(?:Cars|Hard)
See live demo (showing matching umlauted letters and ß).
Solution 2:[2]
You might start the match with an uppercase character allowing German uppercase chars as well, and then optionally repeat matching either words that start with an uppercase character, or a "special character.
Then end the match with an alternation matching either Hard or Cars.
(?<!\S)[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]*(?:\s+(?:[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]*|[+()&]))*\s+(?:Hard|Cars)\b
Explanation
(?<!\S)
Assert a whitespace boundary to the left to prevent starting the match after a non whitespace char[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]*
Match a word that starts with an uppercase char(?:
Non capture group to match as a whole part\s+
Match 1+ whitespace chars(?:
Non capture group[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]*
Match a word that starts with uppercase|
Or[+()&]
Match one of the "special" chars
)
Close the non capture group
)*
Close the non capture group and optionally repeat it\s+
Match 1+ whitespace chars(?:Hard|Cars)
Match one of the alternatives\b
A word boundary to prevent a partial word match
See a regex demo.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | The fourth bird |