I am unable to compare two unicode characters that to my mind should be exactly the same. I suspect that somehow they are encoded differently, but don’t know how to change them to the same encoding.
The characters I want to compare are from the Myanmar Unicode block. I’m running wordpress on php 5 and am trying to make a custom plugin to handle Myanmar Unicode. All my files are encoded in UTF-8, but I don’t know what wordpress does.
Here is what I’m doing:
function myFunction( $inputText ) {
$outputText = '';
$inputTextArray = str_split($inputText);
foreach($inputTextArray as $char) {
if ($char == "á") // U+1000, a character from the Myanmar Unicode block
$outputText .= $char;
}
return $outputText;
}
add_filter( 'the_content', 'myFunction');
At this stage in working things out, the function is supposed to only return á where it appears in the content. However, it never returns anything but empty strings, even when á is clearly present in the post content. If I change the character to any latin characters, the function works as expected.
So, my question is, how do I encode these characters (either $char
or "á"
) so that when $char
contains this character, they compare equal.
str_split
is not unicode aware. For multibyte characters it’ll split the them in single character. Try to use either multi-byte string functions orpreg_split
with/u
switchhttp://codepad.viper-7.com/ErFwcy
Using multi-byte function mb_substr_count you can reduce your code too. Like this,
Or using regular expression,