Is sanitize_title enough to generate post slugs?

I want to generate slug for some strings without going through WordPress slug generation flow. Therefore, I want to know which functions it calls to get a neat slug. I tried sanitize_title() but it leaves %c2 %a0 in result.

Related posts

Leave a Reply

4 comments

  1. sanitize_title() seems to be the only one you need.

    In wp-includes/default-filters.php line 211 you will find:

    add_filter( 'sanitize_title', 'sanitize_title_with_dashes', 10, 3);
    

    This means that calling sanitize_title() will first remove all the special characters, then apply the sanitize_title filter, thus calling sanitize_title_with_dashes()

    As @JHoffmann pointed out, simply calling sanitize_title_with_dashes() will not remove special characters.

  2. Well, there is already an answer, but I wanted to expand it a bit, so here are my findings:

    If we have a look in wp_insert_post() we see, the $post_name is sanitized using wp_sanitize_title() ( see wp-includes/post.php)

    In the function sanitize_title() we have a filter sanitize_title. This is interesting, since in the default filters sanitize_title_with_dashes() is hooked into this filter (see wp-includes/default-filters.php).

    <?php
    echo sanitize_title( 'Â+ÄÖßáèäç' ) // aaeoessaeaec
    ?>
    

    I tried sanitize_title() but it leaves %c2 %a0 in result.

    This sounds strange. It would be great to know the input value, but following wp_insert_post() sanitize_title() seems to be enough.

  3. In addition to websupporter’s great answer I found the below:

    Depending on your usage it will depend what you need.

    sanitize_title() as it says:

    accents are removed (accented characters are replaced with non-accented equivalents)

    …and sanitize_title_with_dashes says:

    Note that it does not replace special accented characters

    So, with this example string: Â+Ä Ö %%% ßá %20 oo %pp + -_^^#@!**()=[]|/'"<>?``~ èäç

    sanitize_title() result:

    aa-o-sa-%20-oo-pp-_-eac

    As you can see it has replaced accented characters with their non-accented equivalents and it has removed all other non-alphanumeric characters apart from the % which is followed by a number, but you will see it was removed when it was followed by a letter; perhaps this is because it perceives it as already encoded. This is enforced when you try inserting %c3 into your string, it doesn’t strip it as %c3 is a valid encoding sequence.

    sanitize_title_with_dashes result:

    %c3%a2%c3%a4-%c3%b6-%c3%9f%c3%a1-%20-oo-pp-_-%c3%a8%c3%a4%c3%a7

    So as you can see it hasn’t removed the accented characters, but encoded them.

    Now let’s look at a string with no accented characters to see how they both behave…

    Example String: %%% building %20 oo %pp + -_^^#@!**()=[]|/'"<>?``~'

    sanitize_title() result:

    building-%20-oo-pp-_

    sanitize_title_with_dashes result:

    building-%20-oo-pp-_

    So as you can see they are exactly the same. So it appears the only difference in them is that one encodes accented charters whilst the other replaces them.