'Strict regex in Pandas replace
I need to write a strict regular expression
to replace certain values in my pandas
dataframe. This is an issue that was raised after solving the question that I posted here.
The issue is that the .replace(idsToReplace, regex=True)
is not strict. Therefore if the iDsToReplace are:
NY : New York
NYC : New York City
and the comment in which we are replacing the ID is:
My cat from NYC is large.
The resulting response is:
My cat from New York is large.
Is there a pythonic way within the pandas
replace
function to make the regular expression
stricter to match with NYC
and not NY
?
Solution 1:[1]
Add \b
for word boundaries
to each key of dict
:
d = {'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City', 'NY' : 'New York'}
data = {'Categories': ['animal','plant','object'],
'Type': ['tree','dog','rock'],
'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
'The rock was found in LA.']
}
d = {r'\b' + k + r'\b':v for k, v in d.items()}
df = pd.DataFrame(data)
df['commentTest'] = df['Comment'].replace(d, regex=True)
print (df)
Categories Comment Type \
0 animal The NYC tree is very big tree
1 plant NY The cat from the UK is small dog
2 object The rock was found in LA. rock
commentTest
0 The New York City tree is very big
1 New York The cat from the United Kingdom is small
2 The rock was found in Los Angeles.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |