How to convert a WordPress UTF-8 database containing latin1 characters in MySQL

I’m working with an old WordPress database. In the backend, some posts contain characters like:

ü

But they display on the front end as:

Read More
ü

Changing define('DB_CHARSET', 'utf8'); in wp-config.php solves this particular issue, however for future compatibility (see: https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/), I believe it’s a bad idea to leave the database as is and would like to convert it. Entering a ü directly into the PHP template works without changing anything. So this is clearly a mySQL encoding issue.

I’ve taken the following steps so far:

  1. Run the following commands from PHPMyAdmin:

ALTER DATABASE MyDb CHARACTER SET utf8;

and

ALTER TABLE wp_posts CHARACTER SET utf8;

and

UPDATE wp_posts SET post_content = convert(cast(convert(post_content using latin1) as binary) using utf8) (taken from here: MySQL – Convert latin1 characters on a UTF8 table into UTF8)

I’ve also used this tool: https://interconnectit.com/products/search-and-replace-for-wordpress-databases/ to convert all tables to utf-8 mb4 encoding.

alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET latin1;
alter table wp_posts change post_content post_content LONGBLOB;
alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET utf8;
  1. I’ve also tried this approach:
    http://alexking.org/blog/2008/03/06/mysql-latin1-utf8-conversion

  2. Completely emptied wp_posts and tried entering a new post.

With all these approaches, when creating a new post and entering an ü character, it still shows up as ü.

This thread here also clearly describes the issues I’m having:
https://wordpress.org/support/topic/trouble-converting-database-containing-special-characters-to-utf-8

Any recommendations or insights from here are much appreciated as I’m completely stuck!

Thanks

Related posts

2 comments

  1. The database and table declarations are merely defaults for future table and column declarations, respectively.

    ALTER TABLE CHANGE ... changes the declaration without transforming the encoding. That is, this may have made things worse.

    First, let’s verify what you have. Please provide SHOW CREATE TABLE for a table of interest. And provide this for some data in it that should have an accented character such as ü:

    SELECT col, HEX(col) FROM tbl WHERE ...
    

    If the HEX is FC, you have latin1 encoding. If the column says CHARACTER SET latin1, then they are consistent. (All latin1 encodings are 1 byte.)

    If the HEX is C3BC, you have utf8 encoding. If the column says CHARACTER SET utf8 (or utf8mb4), then they are consistent. (utf8 encodings for accented characters are usually 2 bytes.)

    If the encoding and the charset are not consistent, then panic. You have a mess that needs cleaning up.

    When “entering” data, two things are important: (1) the encoding of the character, and (2) what the client thinks the encoding is. The encoding depends on where you got the character (how you typed it, pasted it, etc). You tell the client to tell mysqld what the encoding is in various ways. define('DB_CHARSET', 'utf8'); is needed for WordPress; something else is needed for phpmyadmin.

    As long as you configure the client correctly, it does not matter whether the column’s CHARACTER SET is the same. Conversion will automatically occur. (However, it is best to be consistent — utf8 (or utf8mb4) at all stages.)

    ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4; will actively change the CHARACTER SET and the bytes in all the columns of that table. This will work whether the data is currently latin1 or utf8, but it must be consistent with the declaration (as mentioned above).

  2. Please use the code for convert your WordPress database. It is enough to change the appropriate sections in the code for charet…

    <?php
    error_reporting(E_ALL);
    $dbname = "YOUR-DATABASE-NAME";
    mysql_connect("localhost", "YOUR-DATABASE-USERNAME", "YOUR-DATABASE-PASSWORD") or die(mysql_error());
    mysql_select_db("$dbname");
    mysql_query("SET NAMES 'utf8';") or die(mysql_error());
    
    $query = "SHOW TABLES";
    $result = mysql_query($query) or die(mysql_error());
    while ($data = mysql_fetch_assoc($result)) {
    
    	$table = $data["Tables_in_$dbname"];
    	$query = "alter table $table convert to character set utf8 collate utf8_turkish_ci";
    	mysql_query($query) or die(mysql_error());
    	echo "<b>$table</b><br>";
    
    	$query = "SHOW COLUMNS FROM $table";
    	$result_2 = mysql_query($query) or die(mysql_error());
    	while ($columns = mysql_fetch_assoc($result_2)) {
    
    		if (
    				(stripos($columns['Type'], 'varchar')!==false)
    				||
    				(stripos($columns['Type'], 'text')!==false)
    		) {
    			$query = "ALTER TABLE $table MODIFY {$columns['Field']} {$columns['Type']} CHARACTER SET utf8 COLLATE utf8_turkish_ci";
    			mysql_query($query) or die(mysql_error());
    			echo "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{$columns['Field']}<br>";
    		}
    	}
    }
    
    echo "<hr><h1>Done!</h1>";

Comments are closed.