'Git bash on Windows different result than terminal on CentOS for regex [duplicate]
See the following cleanCustomer.sh
file
#!/bin/bash
customer=Reportçós
cleanedCustomer=${customer//[^a-zA-Z0-9 \-_.]/}
echo $cleanedCustomer
When I run it on Windows 11 in Git Bash it prints Reports
.
When I run it on CentOS in terminal it prints Reportçós
.
Anybody knows why is a-z
interpreted as alpha characters in CentOS and not in Windows?
How do I ensure only english characters are considered in the CentOS?
Solution 1:[1]
From the bash manual:
A pair of characters separated by a hyphen denotes a range expression; any character that falls between those two characters, inclusive, using the current locale’s collating sequence and character set, is matched. If the first character following the ‘[’ is a ‘!’ or a ‘^’ then any character not enclosed is matched. A ‘-’ may be matched by including it as the first or last character in the set.
Your Git Bash locale uses rules that don't match accented characters in ranges like a-z
, your CentOS locale does. This can be addressed by using a consistent locale like C
for collation. Plus your -
is in the wrong spot; it needs to be first or last, and the backslash needs to be escaped with another backslash to match a literal one.
#!/bin/bash
LC_COLLATE=C
customer=Reportçós
cleanedCustomer=${customer//[^a-zA-Z0-9 \\_.-]/}
printf "%s\n" "$cleanedCustomer"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |