Sorting å, ä, ö – gets mixed with a / o

Using WordPress and PHP, I’m creating a list of tags on a site sorted by their first character,

<?php
$letters = range( 'a','z' );
array_push( $letters, 'å', 'ä', 'ö' );
foreach ( $letters as $index=>$letter ) : ?>
<h3><?php echo $letter; ?></h3>
<ul>
<?php
$tags = get_tags( array('name__like' => $letter, 'hide_empty' => 0) );
foreach ( $tags as $tag ) :
?>
<li><a href="/tag/<?php echo $tag->slug; ?>/"><?php echo $tag->name; ?></a></li>
<?php endforeach; ?>
</ul>
<?php endforeach; ?>

The output works except my Swedish characters Ã¥, ä, ö are also included in a / o and vice versa, as if PHP can’t distinguish them (even when manually pushing them into the array as their own entry). Ideas?

Related posts

Leave a Reply

1 comment

  1. I’ve been battling the same problem. You can use the setlocale() together with asort($array ,SORT_LOCALE_STRING). That puts the ÅÄÖ characters at the end. Unfortunately they’re in the wrong order. PHP sorts them ä, Ã¥, ö. In nordic languages (at least in the swedish alphabet) the are sorted as follows: Ã¥, ä, ö. After a lot of googling I could not solve this with PHP native functionality so I made my own “patched” sorting that I thought I might share.

    I am by no means an expert in writing code optimized for computing speed, so I am sure there are several improvements that can be made to this code. I welcome you to point them out!

    This function sorts an array based on the whole strings though, and not just the first character. But I believe that’s the most useful way to do it, and If you wish to ignore all characters after the first I’m sure you can modify this. Also it’s quite easy to modify it to apply to other characters and other sorting orders. Enjoy

    function sort_nordic(&$array) {
      uasort($array, 'nordic_cmp');
    }
    function nordic_cmp($a, $b) {
      // If å, ä, and ö is missing from first string, just use PHP's native function
      if (preg_match('/([å]|[ä]|[ö]|[Å]|[Ä]|[Ö])/', $a) == 0) {
        return strcoll($a, $b);
      }
      // If å, ä, and ö is missing from second string, also use PHP's native function
      if (preg_match('/([å]|[ä]|[ö]|[Å]|[Ä]|[Ö])/', $b) == 0) {
        return strcoll($a, $b);
      }
    
      // Arriving here both the strings contains some characters we have too look out for
      // First, create arrays from the strings so we can easily loop over the characters
      // Comparison is made in lowercase
      $a_arr = preg_split('//u', mb_strtolower($a), -1, PREG_SPLIT_NO_EMPTY);
      $b_arr = preg_split('//u', mb_strtolower($b), -1, PREG_SPLIT_NO_EMPTY);
    
      // Get the length of the shortest string
      $end = min(mb_strlen($a), mb_strlen($b));
    
      // Loop over the characters and compare them in pairs
      for ($i = 0; $i < $end; $i++) {
        // Check if character in the first string is one that we have to correct for
        if (mb_stripos("åäö", $a_arr[$i]) !== false) {
          // Computes the corrected return value. The first character "-" is just a 
          // nonsene placeholder to return 1 for "ä", 2 for "å" and 3 for "ö"
          $r = mb_stripos("-åäö", $a_arr[$i]) - mb_stripos("-åäö", $b_arr[$i]);
          if ($r != 0) {
            return $r;
          }
        } else {
          // If the character is not a character that we have to correct for 
          // the PHP native works fine
          $r = strcoll($a_arr[$i], $b_arr[$i]);
          if ($r != 0) {
            return $r;
          }
        }
      }
      // Fallback: If so far there has been no call to return() then the 
      // strings are idenical up until the length of the shorter string.
      // Then let the lengths of the strings determine the order
      return mb_strlen($a) - mb_strlen($b);
    }