'python rstrip or remove end of string by a pattern of characters

I'm trying to strip the end of the strings in this column. I've seen how to rstrip a specific character, or a set number of characters at the end of a string, but how do you do it based on a pattern?

I'd like to remove the entire end of the strings in the 'team' column at where we see a lowercase followed by an upper case. Then remove starting at the uppercase. I would like the below 'team' column:

   team                              pts/g
St. Louis RamsSt. Louis             32.875
Washington RedskinsWashington       27.6875
Minnesota VikingsMinnesota          24.9375
Indianapolis ColtsIndianapolis      26.4375
Oakland RaidersOakland              24.375
Carolina PanthersCarolina           26.3125
Jacksonville JaguarsJacksonville    24.75
Chicago BearsChicago                17.0
Green Bay PackersGreen Bay          22.3125
San Francisco 49ersSan Francisco    18.4375
Buffalo BillsBuffalo                20.0

to look like this:

   team                              pts/g
St. Louis Rams                      32.875
Washington Redskins                 27.6875
Minnesota Vikings                   24.9375
Indianapolis Colts                  26.4375
Oakland Raiders                     24.375
Carolina Panthers                   26.3125
Jacksonville Jaguars                24.75
Chicago Bears                       17.0
Green Bay Packers                   22.3125
San Francisco 49ers                 18.4375
Buffalo Bills                       20.0


Solution 1:[1]

You can use re.sub(pattern, repl, string) for that.

Let's use this regular expression for matching:

([a-z])[A-Z].*?(  )

It matches a lowercase character ([a-z]), followed by an uppercase character [A-Z] and any character .*? until it hits two spaces ( ). The lowercase character and the two spaces are in a group, so they can be re-inserted using \1 for the first and \2 for the second group when using re.sub:

new_text = re.sub(r"([a-z])[A-Z].*?(  )", r"\1\2", text)

Output for your example:

   team                              pts/g
St. Louis Rams             32.875
Washington Redskins       27.6875
Minnesota Vikings          24.9375
Indianapolis Colts      26.4375
Oakland Raiders              24.375
Carolina Panthers           26.3125
Jacksonville Jaguars    24.75
Chicago Bears                17.0
Green Bay Packers          22.3125
San Francisco 49ers    18.4375
Buffalo Bills                20.0

This messed the space-alignment up. Might not be relevant for you, but if you want to replace the wiped characters with space, you can pass a function instead of a replacement string to re.sub, which takes a Match object and returns a str:

def replace_with_spaces(match):
    return match.group(1) + " "*len(match.group(2)) + match.group(3)

And then use it like this (notice how I put the to-be-replaced part into a regex-group too):

new_text = re.sub(r"([a-z])([A-Z].*?)(  )", replace_with_spaces, text)

This produces:

   team                              pts/g
St. Louis Rams                      32.875
Washington Redskins                 27.687
Minnesota Vikings                   24.937
Indianapolis Colts                  26.437
Oakland Raiders                     24.375
Carolina Panthers                   26.312
Jacksonville Jaguars                24.75
Chicago Bears                       17.0
Green Bay Packers                   22.312
San Francisco 49ers                 18.437
Buffalo Bills                       20.0

Solution 2:[2]

Well, I don't thing it's so easy : because of the spaces which may separate 2 words that should be removed. I suggest, for your problem only, to remove the smallest ending that is also a begging. hum... not very easy to explain. Here is a little function and its test :

def smart_rstrip ( s ):
    for i in xrange( 1,len( s )):
        if s.endswith( s[:i] ):
            return s[:-i]
    return s


s = ['St. Louis RamsSt. Louis', 'Washington RedskinsWashingt...]
print '\n'.join( s )
print '\n'.join( map( smart_rstrip,s ))

Try it, I thing you will get what you want...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Captain'Flam