Comparing Unicode Characters in PHP in WordPress

I am unable to compare two unicode characters that to my mind should be exactly the same. I suspect that somehow they are encoded differently, but don’t know how to change them to the same encoding.

The characters I want to compare are from the Myanmar Unicode block. I’m running wordpress on php 5 and am trying to make a custom plugin to handle Myanmar Unicode. All my files are encoded in UTF-8, but I don’t know what wordpress does.

Here is what I’m doing:

function myFunction( $inputText ) {
    $outputText = '';
    $inputTextArray = str_split($inputText);
    foreach($inputTextArray as $char) {
        if ($char == "á") // U+1000, a character from the Myanmar Unicode block 
            $outputText .= $char;
    }
    return $outputText;
}
add_filter( 'the_content', 'myFunction');

At this stage in working things out, the function is supposed to only return á where it appears in the content. However, it never returns anything but empty strings, even when á is clearly present in the post content. If I change the character to any latin characters, the function works as expected.

So, my question is, how do I encode these characters (either $char or "á") so that when $char contains this character, they compare equal.

Post Views: 1

1 comment

str_split is not unicode aware. For multibyte characters it’ll split the them in single character. Try to use either multi-byte string functions or preg_split with /u switch

$inputTextArray = preg_split("//u", $inputText, -1, PREG_SPLIT_NO_EMPTY);

http://codepad.viper-7.com/ErFwcy

Using multi-byte function mb_substr_count you can reduce your code too. Like this,

function myFunction( $inputText ) {
    return str_repeat("á", mb_substr_count($inputText, "á"));
}

Or using regular expression,

preg_match_all("/á/u", $text, $match);
$output = implode("", $match[0]);

Comparing Unicode Characters in PHP

Leave a Reply Cancel reply

1 comment

Social Network

Related posts

Leave a Reply Cancel reply

1 comment

Social Network