'How to extract all words of a camel cased string with a regular expression?

Assume I have a string that consists of multiple words. These words aren't separated by spaces, but every word starts with a capital letter. This type of naming convention is usually called "camel case". Some examples:

  • ApplicationRecord
  • CamelCase
  • FirstNumberAfterACharacter

Now I want to split these strings into single words, so FirstNumberAfterACharacter becomes ["First", "Number", "After", "A", "Character"] for example.

Finding a regular expression that matches those strings is also quite easy: ^([A-Z][a-z]*)+$. But if I try to get all matches, this regular expression will only return the last match:

irb(main):003:0> /^([A-Z][a-z]*)+$/.match('FirstNumberAfterACharacter').captures
=> ["Character"]

irb(main):004:0> 'FirstNumberAfterACharacter'.scan(/^([A-Z][a-z]*)+$/)
=> [["Character"]]

So how do I get all matches, not just the last one?



Solution 1:[1]

I changed your regexp to:

start with a group (...) that consists of single capital letter: [A-Z]{1}, follows by zero or more capital letters [^A-Z]*.

'FirstNumberAfterACharacter'.scan(/([A-Z][^A-Z]*)/).flatten(1)

Solution 2:[2]

You can use a regex that extract any kind of Unicode uppercase letter followed by any non-uppercase letters:

'FirstNumberAfterACharacter'.scan(/\p{Lu}\P{Lu}*/)
# => ["First", "Number", "After", "A", "Character"]

See the Ruby online demo.

Details:

  • \p{Lu} - any Unicode letter
  • \P{Lu}* - zero or more (*) letters other than Unicode letters.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Wiktor Stribiżew