'How do i write a RegEx that starts reading from behind?

I have a series of words I try to capture.

I have the following problem:

  • The string ends with a fixed set of words
  • It is not clearly defined how many words the string consists of. However, it should capture all words that start with a upper case letter (German language). Therefore, the left anchor should be the first word starting with lower case.

Example (bold is what I try to capture):

  • I like Apple Bananas And Cars.

  • building houses Might Be Salty + Hard said Jessica.

This is the RegEx I tried so far, it only works, if the "non-capture" string does not include any upper case words: /(?:[a-zäöü]*)([\p{L} +().&]+[Cars|Hard])/gu



Solution 1:[1]

Use \p{Lu} for uppercase letters:

(?:[\p{Lu}+()&][\p{L}+()&]* )+(?:Cars|Hard)

See live demo (showing matching umlauted letters and ß).

Solution 2:[2]

You might start the match with an uppercase character allowing German uppercase chars as well, and then optionally repeat matching either words that start with an uppercase character, or a "special character.

Then end the match with an alternation matching either Hard or Cars.

(?<!\S)[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]*(?:\s+(?:[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]*|[+()&]))*\s+(?:Hard|Cars)\b

Explanation

  • (?<!\S) Assert a whitespace boundary to the left to prevent starting the match after a non whitespace char
  • [A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]* Match a word that starts with an uppercase char
  • (?: Non capture group to match as a whole part
    • \s+ Match 1+ whitespace chars
    • (?: Non capture group
      • [A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]* Match a word that starts with uppercase
      • | Or
      • [+()&] Match one of the "special" chars
    • ) Close the non capture group
  • )* Close the non capture group and optionally repeat it
  • \s+ Match 1+ whitespace chars
  • (?:Hard|Cars) Match one of the alternatives
  • \b A word boundary to prevent a partial word match

See a regex demo.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 The fourth bird