'How to extract all words of a camel cased string with a regular expression?
Assume I have a string that consists of multiple words. These words aren't separated by spaces, but every word starts with a capital letter. This type of naming convention is usually called "camel case". Some examples:
- ApplicationRecord
- CamelCase
- FirstNumberAfterACharacter
Now I want to split these strings into single words, so FirstNumberAfterACharacter
becomes ["First", "Number", "After", "A", "Character"]
for example.
Finding a regular expression that matches those strings is also quite easy: ^([A-Z][a-z]*)+$
.
But if I try to get all matches, this regular expression will only return the last match:
irb(main):003:0> /^([A-Z][a-z]*)+$/.match('FirstNumberAfterACharacter').captures
=> ["Character"]
irb(main):004:0> 'FirstNumberAfterACharacter'.scan(/^([A-Z][a-z]*)+$/)
=> [["Character"]]
So how do I get all matches, not just the last one?
Solution 1:[1]
I changed your regexp to:
start with a group (...)
that consists of single capital letter: [A-Z]{1}
, follows by zero or more capital letters [^A-Z]*
.
'FirstNumberAfterACharacter'.scan(/([A-Z][^A-Z]*)/).flatten(1)
Solution 2:[2]
You can use a regex that extract any kind of Unicode uppercase letter followed by any non-uppercase letters:
'FirstNumberAfterACharacter'.scan(/\p{Lu}\P{Lu}*/)
# => ["First", "Number", "After", "A", "Character"]
See the Ruby online demo.
Details:
\p{Lu}
- any Unicode letter\P{Lu}*
- zero or more (*
) letters other than Unicode letters.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Wiktor Stribiżew |