Regular expression to match Unicode string with only letters

From FVue
Jump to: navigation, search

Problem

How can I validate a string input that it's all letters? With ASCII I could match with a regular expression [a-zA-z] or [:alpha:], but this shows "no" unexpected:

$ php -r 'print_r(preg_match("/[a-zA-Z]/", "ó"));'
0
$ php -r 'print_r(preg_match("/[[:alpha:]]/", "ó"));'
0

Solution

Use the \pL escape sequence together with the 'u' (Unicode) modifier:

$ # Match hyphen (-), Unicode letter (\pL) or ampersand (&)
$ php -r 'print_r(preg_match("/^[-\pL&]+$/u", "-fóó&"));'
1
$ # The digit 1 will not match
$ php -r 'print_r(preg_match("/^[-\pL&]+$/u", "-fóó&1"));'
0

See also: http://www.php.net/manual/en/regexp.reference.unicode.php

NOTE: PCRE need to be compiled with "--enable-unicode-properties".

Journal

20140612

Tried using the [:alpha:] character class together with the 'u' (Unicode) modifier:

$ php -r 'print_r(preg_match("/[[:alpha:]]/u", "ó"));'
1

But on some computers this returns 0 instead of 1 and I don't know why? Maybe because of this change according to http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php: "Five and six octet UTF-8 sequences are regarded as invalid since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been regarded as valid UTF-8."

On http://www.php.net//manual/en/regexp.reference.character-classes.php it says: "In UTF-8 mode, characters with values greater than 128 do not match any of the POSIX character classes."

Tried ctype_alpha, but to avail:

$ php -r 'print_r(ctype_alpha("fóóbar"));'
0
http://stackoverflow.com/questions/961573/utf-8-isalpha-in-php
Forum question with the \p{L} solution