'Git bash on Windows different result than terminal on CentOS for regex [duplicate]

See the following cleanCustomer.sh file

#!/bin/bash
customer=Reportçós
cleanedCustomer=${customer//[^a-zA-Z0-9 \-_.]/}
echo $cleanedCustomer

When I run it on Windows 11 in Git Bash it prints Reports.
When I run it on CentOS in terminal it prints Reportçós.

Anybody knows why is a-z interpreted as alpha characters in CentOS and not in Windows?
How do I ensure only english characters are considered in the CentOS?



Solution 1:[1]

From the bash manual:

A pair of characters separated by a hyphen denotes a range expression; any character that falls between those two characters, inclusive, using the current locale’s collating sequence and character set, is matched. If the first character following the ‘[’ is a ‘!’ or a ‘^’ then any character not enclosed is matched. A ‘-’ may be matched by including it as the first or last character in the set.

Your Git Bash locale uses rules that don't match accented characters in ranges like a-z, your CentOS locale does. This can be addressed by using a consistent locale like C for collation. Plus your - is in the wrong spot; it needs to be first or last, and the backslash needs to be escaped with another backslash to match a literal one.

#!/bin/bash
LC_COLLATE=C
customer=Reportçós
cleanedCustomer=${customer//[^a-zA-Z0-9 \\_.-]/}
printf "%s\n" "$cleanedCustomer"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1