sanitize_title_with_dashes
(see code below for reference) is the function WordPress uses to format “pretty” urls. However, contrary to the function’s comment header, it allows much more than alphanumeric characters, underscore (_) and dash (-). It also allows signs like °, etc.
How would I go about really allowing only alphanumeric characters and dashes?
/**
* Sanitizes title, replacing whitespace with dashes.
*
* Limits the output to alphanumeric characters, underscore (_) and dash (-).
* Whitespace becomes a dash.
*
* @since 1.2.0
*
* @param string $title The title to be sanitized.
* @return string The sanitized title.
*/
function sanitize_title_with_dashes($title) {
$title = strip_tags($title);
// Preserve escaped octets.
$title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
// Remove percent signs that are not part of an octet.
$title = str_replace('%', '', $title);
// Restore octets.
$title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
$title = remove_accents($title);
if (seems_utf8($title)) {
if (function_exists('mb_strtolower')) {
$title = mb_strtolower($title, 'UTF-8');
}
$title = utf8_uri_encode($title, 200);
}
$title = strtolower($title);
$title = preg_replace('/&.+?;/', '', $title); // kill entities
$title = str_replace('.', '-', $title);
$title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
$title = preg_replace('/s+/', '-', $title);
$title = preg_replace('|-+|', '-', $title);
$title = trim($title, '-');
return $title;
}
Consider this function as a rough placeholder. It has more flaws than you might imagine ⦠🙂
There are many plugins to improve the conversion for different languages and needs. You may take a look at my plugin Germanix to see how this could be done.