Special characters encoding in image filenames after server migration

I’ve migrated a WordPress website from a Hostgator shared host to a Ubuntu Digital Ocean LAMP stack.

The trouble started when I exported the image files which had special characters, for example the file
operários_tarsila-1024x640.jpg.

Read More

When WordPress tries to reach the file, it displays an error. I’ve found the cause:

I can see via Inspect Element that WordPress tries to call: http://mywebsite.com/wp-content/uploads/2013/02/oper%C3%A1rios_tarsila-1024×640.jpg and the server returns a 404 error.

However if I type this URL in the browser: http://mywebsite.com/wp-content/uploads/2013/02/opera%CC%81rios_tarsila-1024×640.jpg it works and the image is displayed.

So, it seems like this difference between the á encoding from %C3%A1 (á character) to a+%CC%81 (combining accute accent) is what is causing WordPress to not display my images.

So now I have in my server thousands of accented image filenames with the structure character+ combining accent and WordPress calling the image filenames with the structure accented character.

Is there a way bash rename all of them with a comparisson table? Or a way to make Apache aware of those differences and point to the right file when this kind of confusion happen?

Related posts

7 comments

  1. Apparently the problem is how the backup is decompressed on the new server.

    There are 2 ways to fix this:

    1. Rename the files manually by names without accents and then modify the database and change the file names in the database (This maluco and can be dangerous, it would be best to back up the database).

    2. Upload files using Filezilla, but setting it to force the charset encoding in UTF-8.

    File> Site Manager> {YOUR SITE}> Tab Charset> Force UTF-8

  2. We have same problem – Mac + FileZilla + special characters in SK language.

    Problem fixed using another FTP client (Cyberduck in our case ).

    It seems to be a problem with FileZilla filenames encofing. Force utf8 encoding (FileZilla host settings) doesn’t help.

  3. So, just to touch upon this issue and a solution that worked for me… I also migrated a WordPress site and found that all images with special characters in their filename produced a 404 after migration.

    I ended up having to do the manual file renaming and edits to the database via phpMyAdmin. It was arduous and I definitely recommend backing up your database first.

    In my case, I had a ton of media attachments that used the special character © in their filename.

    First, I locally renamed the files by removing the character. I used 1-4a rename. Just found the filename and replaced it with nothing (not even a space). Then, I removed all the old files from the /wp-content/uploads/ folder and replaced them with the new files.

    Next, I went into my database to update the table values. Media attachments have info stored in both the wp_posts and wp_postmeta tables. Below is the SQL I ran to update both –

    update wp_posts set guid = replace(guid,'©','');
    
    UPDATE wp_postmeta SET meta_value = REPLACE(meta_value, '©', '') 
    WHERE LOWER(RIGHT(meta_value, 5)) = '.jpeg' OR 
    LOWER(RIGHT(meta_value, 4)) IN ('.jpg', '.gif', '.png')
    

    Which, again, we are replacing the character with nothing, not even a space.

    I had to use the WP plugin Regenerate Thumbnails in order to have all of thumbnails + various attachment sizes update, but that did the trick.

    I really appreciate everyone’s efforts on this post and this post to help me figure it out! Hope this helps someone!

  4. Have you tried setting the same encoding in PHP script, Mysql and HTML ?

    PHP : http://php.net/manual/en/function.mb-internal-encoding.php

    Mysql : http://php.net/manual/en/function.mysql-set-charset.php

    HTML : <meta http-equiv="content-type" content="text/html; charset=utf-8" />

    This problem is looking like a charset accordance problem between all these languages.

    If this is not working, you will have to use a small script to rename all your pictures, using a function like :

    function wd_remove_accents($str, $charset='utf-8')
    {
        $str = htmlentities($str, ENT_NOQUOTES, $charset);
    
        $str = preg_replace('#&([A-za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#', '1', $str);
        $str = preg_replace('#&([A-za-z]{2})(?:lig);#', '1', $str); // pour les ligatures e.g. '&oelig;'
        $str = preg_replace('#&[^;]+;#', '', $str); // supprime les autres caractères
    
        return $str;
    }
    

    Source : http://www.weirdog.com/blog/php/supprimer-les-accents-des-caracteres-accentues.html

  5. We have just had a similar problem with french caracters in our wordpress deployment, and our solution was to upload the files with FileZilla from a PC instead of FileZilla from a Mac.

    When I would upload from mac OSX to the CentOS server, the files will only show if called in the a+%CC%81 format.

    When I uploaded the files from the PC, apache found the files in the %C3%A1 format, which was how wordpress had them encoded.

  6. If you have WP_CLI run this BashScript. You must change the wp_ table prefix.
    It only modifies the file-names that are NOT on FORM_D format.
    Backup your DB just in case something goes wrong.

    #!/bin/bash
    normalizeWP_PHP_Script=$'
        global $wpdb;
        $rows = $wpdb->get_results( "SELECT * FROM wp_postmeta where meta_key='"'"'_wp_attached_file'"'"'");
        foreach ( $rows as $row ) 
        {
            $postId = $row->{'"'"'post_id'"'"'};
            $filePath = $row->{'"'"'meta_value'"'"'};
            if( ! normalizer_is_normalized($filePath, Normalizer::FORM_D) ){
                $filename_nfd = Normalizer::normalize($filePath, Normalizer::FORM_D);
                echo $filename_nfd." | ";
                $wpdb->query($wpdb->prepare("UPDATE wp_postmeta SET meta_value='"'"'$filename_nfd'"'"' WHERE post_id=$postId"));
            }
        }';
        wp eval "$normalizeWP_PHP_Script"
        echo " - Uploads-url nomalized --nfd"
    

Comments are closed.