'PHP: Check for characters in the Latin script plus spaces and numbers
I am new to regex and I have been going round and round on this problem.
PHP: Check alphabetic characters from any latin-based language? gives the brilliant regex to check for any characters in the Latin script, which is part of what I need.
^\p{Latin}+$
and provides a working example at https://regex101.com/r/I5b2mC/1
If I use the regex in PHP by using
echo preg_match('/^\p{Latin}+$/', $testString);
and $testString
contains only Latin letters, the output will be 1
. If there is any non-Latin letters, the output will be 0
. Brilliant.
To add numbers in I tried ^\p{Latin}+[[:alnum:]]*$
but that allows any characters in the Latin script OR non-Latin letters and numbers (letters without accents — grave, acute, cedilla, umlaut etc.) as it is the equivalent to [a-zA-Z0-9]
.
If you add any numbers with characters in the Latin script, echo preg_match('/^\p{Latin}+[[:alnum:]]*$/', $testString);
returns a 0
. All numbers return a 0
too. This can be confirmed by editing the expression in https://regex101.com/r/I5b2mC/1
How do I edit the expression in echo preg_match('/^\p{Latin}+$/', $testString);
to output a 1
if there are any characters in the Latin script, any numbers and/or spaces in $testString
? For example, I wish for a 1
to be output if $testString
is Café ßüs 459
.
Solution 1:[1]
There are at least two things to change:
- Add
u
flag to support chars other than ASCII (/^\p{Latin}+$/
=>/^[\p{Latin}]+$/u
) - Create a character class for letters, digits and whitespace patterns (
/^\p{Latin}+$/u
=>^[\p{Latin}]+$/u
) - Then add the digit and whitespace patterns. If you need to support any Unicode digits, add
\d
. If you need to support only ASCII digits, add0-9
.
Thus, you can use
preg_match('/^[\p{Latin}\s0-9]+$/u', $testString) // ASCII only digits
preg_match('/^[\p{Latin}\s\d]+$/u', $testString) // Any digits
Also, \s
with u
flag will match any Unicode whitespace chars.
Solution 2:[2]
More generally, it is possible to prohibit any string containing letters that are not Latin (without to add one by one characters or groups of characters you want to allow):
$re = '~ ^ (?! .* [^\PL\p{Latin}] ) .+ $ ~mux';
If you want strings with at least one Latin letter (and no letters from other alphabets), you can use a script run to build your pattern:
$re = '~ ^ [^\pL\r\n]* (?= \p{Latin} ) (*sr: .+ ) $ ~mux';
These two solutions may be more flexible. Obviously it all depends on the goal.
More about script runs here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Wiktor Stribiżew |
Solution 2 |