'How can I use Regex to extract all words that written in the camel case

I tried to extract all consecutive capitalized words in a given string written with no spacing in between.

E.g. The University Of Sydney => TheUniversityOfSydney, Regular Expression => RegularExpression, and This Is A Simple Variable => ThisIsASimpleVariable.

I start with this code, but it comes as a list:

import re
string = "I write a syntax of Regular Expression"
result = re.findall(r"\b[A-Z][a-z]*\b", string)
print(result)

I expect to get RegularExpression here.



Solution 1:[1]

You need to use

import re
text = "I write a syntax of Regular Expression"
rx = r"\b[A-Z]\w*(?:\s+[A-Z]\w*)+"
result = ["".join(x.split()) for x in re.findall(rx, text)]
print(result) # => ['RegularExpression']

See the Python demo.

The regex is explained in How can I use Regex to abbreviate words that all start with a capital letter.

In this case, the regex is used in re.findall to extract matches, and "".join(x.split()) is a post-process step to remove all whitespaces from the found texts.

If you only expect one single match in each string, use re.search:

import re
text = "I write a syntax of Regular Expression"
rx = r"\b[A-Z]\w*(?:\s+[A-Z]\w*)+"
result = re.search(rx, text)
if result:
    print( "".join(result.group().split()) ) # => 'RegularExpression'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew