'python rstrip or remove end of string by a pattern of characters
I'm trying to strip the end of the strings in this column. I've seen how to rstrip a specific character, or a set number of characters at the end of a string, but how do you do it based on a pattern?
I'd like to remove the entire end of the strings in the 'team'
column at where we see a lowercase followed by an upper case. Then remove starting at the uppercase. I would like the below 'team'
column:
team pts/g
St. Louis RamsSt. Louis 32.875
Washington RedskinsWashington 27.6875
Minnesota VikingsMinnesota 24.9375
Indianapolis ColtsIndianapolis 26.4375
Oakland RaidersOakland 24.375
Carolina PanthersCarolina 26.3125
Jacksonville JaguarsJacksonville 24.75
Chicago BearsChicago 17.0
Green Bay PackersGreen Bay 22.3125
San Francisco 49ersSan Francisco 18.4375
Buffalo BillsBuffalo 20.0
to look like this:
team pts/g
St. Louis Rams 32.875
Washington Redskins 27.6875
Minnesota Vikings 24.9375
Indianapolis Colts 26.4375
Oakland Raiders 24.375
Carolina Panthers 26.3125
Jacksonville Jaguars 24.75
Chicago Bears 17.0
Green Bay Packers 22.3125
San Francisco 49ers 18.4375
Buffalo Bills 20.0
Solution 1:[1]
You can use re.sub(pattern, repl, string)
for that.
Let's use this regular expression for matching:
([a-z])[A-Z].*?( )
It matches a lowercase character ([a-z])
, followed by an uppercase character [A-Z]
and any character .*?
until it hits two spaces ( )
.
The lowercase character and the two spaces are in a group, so they can be re-inserted using \1
for the first and \2
for the second group when using re.sub
:
new_text = re.sub(r"([a-z])[A-Z].*?( )", r"\1\2", text)
Output for your example:
team pts/g
St. Louis Rams 32.875
Washington Redskins 27.6875
Minnesota Vikings 24.9375
Indianapolis Colts 26.4375
Oakland Raiders 24.375
Carolina Panthers 26.3125
Jacksonville Jaguars 24.75
Chicago Bears 17.0
Green Bay Packers 22.3125
San Francisco 49ers 18.4375
Buffalo Bills 20.0
This messed the space-alignment up. Might not be relevant for you, but if you want to replace the wiped characters with space, you can pass a function instead of a replacement string to re.sub
, which takes a Match
object and returns a str
:
def replace_with_spaces(match):
return match.group(1) + " "*len(match.group(2)) + match.group(3)
And then use it like this (notice how I put the to-be-replaced part into a regex-group too):
new_text = re.sub(r"([a-z])([A-Z].*?)( )", replace_with_spaces, text)
This produces:
team pts/g
St. Louis Rams 32.875
Washington Redskins 27.687
Minnesota Vikings 24.937
Indianapolis Colts 26.437
Oakland Raiders 24.375
Carolina Panthers 26.312
Jacksonville Jaguars 24.75
Chicago Bears 17.0
Green Bay Packers 22.312
San Francisco 49ers 18.437
Buffalo Bills 20.0
Solution 2:[2]
Well, I don't thing it's so easy : because of the spaces which may separate 2 words that should be removed. I suggest, for your problem only, to remove the smallest ending that is also a begging. hum... not very easy to explain. Here is a little function and its test :
def smart_rstrip ( s ):
for i in xrange( 1,len( s )):
if s.endswith( s[:i] ):
return s[:-i]
return s
s = ['St. Louis RamsSt. Louis', 'Washington RedskinsWashingt...]
print '\n'.join( s )
print '\n'.join( map( smart_rstrip,s ))
Try it, I thing you will get what you want...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Captain'Flam |