'How can I use Regex to extract all words that written in the camel case
I tried to extract all consecutive capitalized words in a given string written with no spacing in between.
E.g. The University Of Sydney
=> TheUniversityOfSydney
, Regular Expression
=> RegularExpression
, and This Is A Simple Variable
=> ThisIsASimpleVariable
.
I start with this code, but it comes as a list:
import re
string = "I write a syntax of Regular Expression"
result = re.findall(r"\b[A-Z][a-z]*\b", string)
print(result)
I expect to get RegularExpression
here.
Solution 1:[1]
You need to use
import re
text = "I write a syntax of Regular Expression"
rx = r"\b[A-Z]\w*(?:\s+[A-Z]\w*)+"
result = ["".join(x.split()) for x in re.findall(rx, text)]
print(result) # => ['RegularExpression']
See the Python demo.
The regex is explained in How can I use Regex to abbreviate words that all start with a capital letter.
In this case, the regex is used in re.findall
to extract matches, and "".join(x.split())
is a post-process step to remove all whitespaces from the found texts.
If you only expect one single match in each string, use re.search
:
import re
text = "I write a syntax of Regular Expression"
rx = r"\b[A-Z]\w*(?:\s+[A-Z]\w*)+"
result = re.search(rx, text)
if result:
print( "".join(result.group().split()) ) # => 'RegularExpression'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Wiktor Stribiżew |