Using htaccess to rewrite special characters for WordPress url aliases

I recently migrated a fairly large site (~6,000 posts) from Drupal to WordPress. As part of the process, I migrated the Drupal-created url aliases to WordPress for SEO and link retention purposes.

An example of a url alias that Drupal created that worked great in Drupal:

Read More
  • /stories/will-this-be-another-la-niña-year

That url in WordPress returns a 404. However, this works:

  • /stories/will-this-be-another-la-nina-year

It seems then my best bet is to write a generic international character to english character set rewrite rule in htaccess, before the url is passed to WordPress.

Any idea how I might do this?
Thanks a lot for whatever help you can give.
Matt.

Related posts

Leave a Reply

2 comments

  1. It seems like there might be a better way to do this within wordpress, you may want to do a quick browser through the wordpress Trac tickets, there maybe some patch or temporary fix for the problem. But if you need to go to an htaccess/redirect method, you can either use a RewriteMap to sanitize and redirect-if-needed or explicitly redirect on non-ascii characters.

    A RewriteMap requires access to either server or vhost config to setup the map. It could be as simple as a list of /stories/will-this-be-another-la-niña-year URIs mapped to http://yourdomain.com/stories/will-this-be-another-la-nina-year (the all ascii URL, the http:// is significant because it tells mod_rewrite to redirect the browser). Or you can write a script to look for non-ascii characters and replace them with the appropriate ascii character.

    Text mapping:

    RewriteMap sanitize txt:/path/to/uri_mapping.txt
    

    Script mapping:

    RewriteMap sanitize prg:/path/to/sanitize_script.php
    

    Then in your htaccess file, you can invoke this mapping like this (these rules will need to be above the wordpress rules, since you want the URI sanitized before wordpress gets a hold of them.

    RewriteRule ^(.*)$ /${sanitize:$1|$1} [L]
    

    If you don’t have access to server/vhost config, you’ll have to enumerate the possibilities in your htaccess file, again putting these rules above the wordpress rules:

    # replace ñ
    RewriteRule ^(.*)ñ(.*)$ /$1n$2 [R=301,L]
    
    # replace ú
    RewriteRule ^(.*)ú(.*)$ /$1ú$2 [R=301,L]
    

    etc.

  2. I just added the following lines at the beginning of my .htaccess file and it works:

    RewriteRule ^(.*)é(.*)$ /$1e$2 [R=301,L]
    RewriteRule ^(.*)è(.*)$ /$1e$2 [R=301,L]
    RewriteRule ^(.*)ê(.*)$ /$1e$2 [R=301,L]
    RewriteRule ^(.*)î(.*)$ /$1i$2 [R=301,L]
    RewriteRule ^(.*)ô(.*)$ /$1o$2 [R=301,L]
    RewriteRule ^(.*)û(.*)$ /$1u$2 [R=301,L]
    RewriteRule ^(.*)â(.*)$ /$1a$2 [R=301,L]
    RewriteRule ^(.*)à(.*)$ /$1a$2 [R=301,L]
    RewriteRule ^(.*)ï(.*)$ /$1i$2 [R=301,L]