Regular expression to match Unicode string with only letters
Contents
Problem
How can I validate a string input that it's all letters? With ASCII I could match with a regular expression [a-zA-z] or [:alpha:], but this shows "no" unexpected:
$ php -r 'print_r(preg_match("/[a-zA-Z]/", "ó"));'
0
$ php -r 'print_r(preg_match("/[[:alpha:]]/", "ó"));'
0Solution
Use the \pL escape sequence together with the 'u' (Unicode) modifier:
$ # Match hyphen (-), Unicode letter (\pL) or ampersand (&)
$ php -r 'print_r(preg_match("/^[-\pL&]+$/u", "-fóó&"));'
1
$ # The digit 1 will not match
$ php -r 'print_r(preg_match("/^[-\pL&]+$/u", "-fóó&1"));'
0See also: http://www.php.net/manual/en/regexp.reference.unicode.php
NOTE: PCRE need to be compiled with "--enable-unicode-properties".
Journal
20140612
Tried using the [:alpha:] character class together with the 'u' (Unicode) modifier:
$ php -r 'print_r(preg_match("/[[:alpha:]]/u", "ó"));'
1But on some computers this returns 0 instead of 1 and I don't know why? Maybe because of this change according to http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php: "Five and six octet UTF-8 sequences are regarded as invalid since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been regarded as valid UTF-8."
On http://www.php.net//manual/en/regexp.reference.character-classes.php it says: "In UTF-8 mode, characters with values greater than 128 do not match any of the POSIX character classes."
Tried ctype_alpha, but to avail:
$ php -r 'print_r(ctype_alpha("fóóbar"));'
0- http://stackoverflow.com/questions/961573/utf-8-isalpha-in-php
- Forum question with the \p{L} solution