Browser could not read filename which contains special characters

I have an image whose filename is Chu Thái.jpg. When uploading it to media library, the filename in hosting has been renamed to Chu-Thái.jpg, but the path of the image doesn’t the same as the filename: http://bem.vn/httq/wp-content/uploads/sites/2/2013/10/Chu-Thái.jpg

So that, when copy the url into the brower, it says the file was not found on this server.

Read More

The requested URL /wp-head/wp-content/uploads/sites/2/2013/10/Chu-Thái-150x150.jpg was not found on this server.

I wonder how the problem caused by WordPress or by my hosting?

Related posts

Leave a Reply

1 comment

  1. The problem is that you should not upload files with special characters in it. What I use in a plugin of mine is the filter sanitize_file_name.

    I ended up pulling and adapting 3 functions from this plugin, so as to do a full clean up of uploaded filenames, so as not to have this kind of error:

    add_filter( 'sanitize_file_name', 't5_sanitize_filename', 10 );
    
    /**
     * Clean up uploaded file names
     * 
     * Sanitization test done with the filename:
     * ÄäÆæÀàÁáÂâÃãÅåªₐāĆćÇçÐđÈèÉéÊêËëₑƒğĞÌìÍíÎîÏïīıÑñⁿÒòÓóÔôÕõØøₒÖöŒœßŠšşŞ™ÙùÚúÛûÜüÝýÿŽž¢€‰№$℃°C℉°F⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉±×₊₌⁼⁻₋–—‑․‥…‧.png
     * @author toscho
     * @url    https://github.com/toscho/Germanix-WordPress-Plugin
     */
    function t5_sanitize_filename( $filename )
    {
        $filename    = html_entity_decode( $filename, ENT_QUOTES, 'utf-8' );
        $filename    = t5_translit( $filename );
        $filename    = t5_lower_ascii( $filename );
        $filename    = t5_remove_doubles( $filename );
        return $filename;
    }
    
    /**
     * Converts uppercase characters to lowercase and removes the rest.
     * https://github.com/toscho/Germanix-WordPress-Plugin
     *
     * @uses   apply_filters( 'germanix_lower_ascii_regex' )
     * @param  string $str Input string
     * @return string
     */
    function t5_lower_ascii( $str )
    {
        $str     = strtolower( $str );
        $regex   = array(
            'pattern'        => '~([^a-zd_.-])~'
            , 'replacement'  => ''
        );
        // Leave underscores, otherwise the taxonomy tag cloud in the
        // backend won’t work anymore.
        return preg_replace( $regex['pattern'], $regex['replacement'], $str );
    }
    
    /**
     * Reduces repeated meta characters (-=+.) to one.
     * https://github.com/toscho/Germanix-WordPress-Plugin
     *
     * @uses   apply_filters( 'germanix_remove_doubles_regex' )
     * @param  string $str Input string
     * @return string
     */
    function t5_remove_doubles( $str )
    {
        $regex = apply_filters(
                'germanix_remove_doubles_regex'
                , array(
                    'pattern'        => '~([=+.-])\1+~'
                    , 'replacement'  => "\1"
                )
        );
        return preg_replace( $regex['pattern'], $regex['replacement'], $str );
    }    
    
    /**
     * Replaces non ASCII chars.
     * https://github.com/toscho/Germanix-WordPress-Plugin
     *
     * wp-includes/formatting.php#L531 is unfortunately completely inappropriate.
     * Modified version of Heiko Rabe’s code.
     *
     * @author Heiko Rabe http://code-styling.de
     * @link   http://www.code-styling.de/?p=574
     * @param  string $str
     * @return string
     */
    function t5_translit( $str )
    {
        $utf8 = array(
            'Ä'  => 'Ae'
            , 'ä'    => 'ae'
            , 'Æ'    => 'Ae'
            , 'æ'    => 'ae'
            , 'À'    => 'A'
            , 'à'    => 'a'
            , 'Á'    => 'A'
            , 'á'    => 'a'
            , 'Â'    => 'A'
            , 'â'    => 'a'
            , 'Ã'    => 'A'
            , 'ã'    => 'a'
            , 'Å'    => 'A'
            , 'Ã¥'    => 'a'
            , 'ª'    => 'a'
            , 'ₐ'    => 'a'
            , 'ā'    => 'a'
            , 'Ć'    => 'C'
            , 'ć'    => 'c'
            , 'Ç'    => 'C'
            , 'ç'    => 'c'
            , 'Ð'    => 'D'
            , 'đ'    => 'd'
            , 'È'    => 'E'
            , 'è'    => 'e'
            , 'É'    => 'E'
            , 'é'    => 'e'
            , 'Ê'    => 'E'
            , 'ê'    => 'e'
            , 'Ë'    => 'E'
            , 'ë'    => 'e'
            , 'ₑ'    => 'e'
            , 'ƒ'    => 'f'
            , 'ğ'    => 'g'
            , 'Ğ'    => 'G'
            , 'Ì'    => 'I'
            , 'ì'    => 'i'
            , 'Í'    => 'I'
            , 'í'    => 'i'
            , 'Î'    => 'I'
            , 'î'    => 'i'
            , 'Ï'    => 'Ii'
            , 'ï'    => 'ii'
            , 'Ä«'    => 'i'
            , 'ı'    => 'i'
            , 'I'    => 'I' // turkish, correct?
            , 'Ñ'    => 'N'
            , 'ñ'    => 'n'
            , 'ⁿ'    => 'n'
            , 'Ò'    => 'O'
            , 'ò'    => 'o'
            , 'Ó'    => 'O'
            , 'ó'    => 'o'
            , 'Ô'    => 'O'
            , 'ô'    => 'o'
            , 'Õ'    => 'O'
            , 'õ'    => 'o'
            , 'Ø'    => 'O'
            , 'ø'    => 'o'
            , 'ₒ'    => 'o'
            , 'Ö'    => 'Oe'
            , 'ö'    => 'oe'
            , 'Œ'    => 'Oe'
            , 'œ'    => 'oe'
            , 'ß'    => 'ss'
            , 'Å '    => 'S'
            , 'Å¡'    => 's'
            , 'ş'    => 's'
            , 'Ş'    => 'S'
            , '™'    => 'TM'
            , 'Ù'    => 'U'
            , 'ù'    => 'u'
            , 'Ú'    => 'U'
            , 'ú'    => 'u'
            , 'Û'    => 'U'
            , 'û'    => 'u'
            , 'Ü'    => 'Ue'
            , 'ü'    => 'ue'
            , 'Ý'    => 'Y'
            , 'ý'    => 'y'
            , 'ÿ'    => 'y'
            , 'Ž'    => 'Z'
            , 'ž'    => 'z'
            // misc
            , '¢'    => 'Cent'
            , '€'    => 'Euro'
            , '‰'    => 'promille'
            , '№'    => 'Nr'
            , '$'    => 'Dollar'
            , '℃'    => 'Grad Celsius'
            , '°C' => 'Grad Celsius'
            , '℉'    => 'Grad Fahrenheit'
            , '°F' => 'Grad Fahrenheit'
            // Superscripts
            , '⁰'    => '0'
            , '¹'    => '1'
            , '²'    => '2'
            , '³'    => '3'
            , '⁴'    => '4'
            , '⁵'    => '5'
            , '⁶'    => '6'
            , '⁷'    => '7'
            , '⁸'    => '8'
            , '⁹'    => '9'
            // Subscripts
            , '₀'    => '0'
            , '₁'    => '1'
            , '₂'    => '2'
            , '₃'    => '3'
            , '₄'    => '4'
            , '₅'    => '5'
            , '₆'    => '6'
            , '₇'    => '7'
            , '₈'    => '8'
            , '₉'    => '9'
            // Operators, punctuation
            , '±'    => 'plusminus'
            , '×'    => 'x'
            , '₊'    => 'plus'
            , '₌'    => '='
            , '⁼'    => '='
            , '⁻'    => '-' // sup minus
            , '₋'    => '-' // sub minus
            , '–'    => '-' // ndash
            , '—'    => '-' // mdash
            , '‑'    => '-' // non breaking hyphen
            , '․'    => '.' // one dot leader
            , '‥'    => '..'  // two dot leader
            , '…'    => '...'  // ellipsis
            , '‧'    => '.' // hyphenation point
            , ' '    => '-'   // nobreak space
            , ' '    => '-'   // normal space
        );
    
        $str = strtr( $str, $utf8 );
        return trim( $str, '-' );
    }