What we have for a project is:
- multiple sites (production, test, local development);
- migrated by multiple methods (PHPMyAdmin, Navicat, BackupBuddy);
And the issue we are having is that while original production site seems to work fine, rest of the installations are constantly plagued by text encoding issues.
Original site is configured with latin
MySQL tables, but WP is configured and serves pages as UTF-8
, which I was told (in our chat) is already problematic. Rest of sites (whose databases mostly mirrors original production site) display issues such as:
- broken characters (correctable by tweaking WP encoding settings);
- broken characters (not correctable by tweaking WP encoding settings);
- site working fine, but feeding broken characters to external libraries.
Since I had tried to untangle this for a while and there isn’t much info on diagnosing encoding issues in WP, my questions are following:
-
How to reliably diagnose if site has encoding configuration issues, even if it might not display them under normal circumstances?
-
Which rules should be formulated, put into documentation and enforced to prevent encoding issues on migration?
So after about a year (on and off!) I had managed to hopefully get a fix on encoding issue.
Why it breaks
What my experience boiled down to, is that encoding issue like this are mostly caused by miscommunication when moving data around.
Preemptive measures
The earliest you can screw up database encoding in WP is when creating database. So even before you even went to download that WP archive to install.
Do not rely on defaults and make sure that components talk in same encoding (like UTF8) internally, as well as to each other and visitors. This goes well beyond WP and involves MySQL configuration, possibly with some kicks for Apache and PHP on top.
See WordPress Database Charset and Collation Configuration
Fixing
When the things are thoroughly broken you are up for a ton of pain figuring out what is wrong and how to get it back to normal.
I found
mb_detect_encoding()
highly useful. It’s not a magic wand, but (in a strict mode) false return from it is good signal that things are not normal.On WP-specific front
$wpdb
has encoding-related properties.When you have a reason/guess/idea of what is wrong – drag data to safe place and try to convert data to be meaningfully normalized, see:
After doing a little searching on this problem, It’s my understanding that the data is actually encoded in utf-8, but being handled like latin. You just need to trick it into reading it correctly with a little juggling.
Try this:
This should trick mysql into reading the data correctly. Apparently if you try to change the encoding when you export, you will get double encoded characters, as the data was already encoded in utf-8.