I’m writing a WordPress plugin that needs to be able to write and read complex data encoded as JSON which can contain UTF-8 encoded text. I’ve had problems reading the file (I get PHP parse errors), but I now suspect that this is because the data is not actually encoded as UTF-8 (as I expected) but as HTML-encoded entities.
The functions that open the output buffer and write into it look like this — am I missing something??
public function createUTFOutput($filename, $json)
{
// Tells the browser to expect a json file and bring up the save dialog in the browser
header('Pragma: public');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Cache-Control: private', false);
if ($json)
header('Content-Type: text/plain; charset=utf-8');
else
header('Content-Type: text/csv; charset=utf-8');
header('Content-Disposition: attachment; filename="'.$filename.'";');
// This opens up the output buffer as a "file"
$fp = fopen('php://output', 'w');
// Hack to write as UTF-8 format
fwrite($fp, pack("CCC",0xef,0xbb,0xbf));
return $fp;
} // createUTFOutput()
// PURPOSE: Write out data about Attribute $the_att to file $fp
public function write_att_data($fp, $the_att)
{
// Create header to indicate Attribute record
fwrite($fp, '{"type": "Attribute", "att-id": "'.$the_att->id.'", '."n");
fwrite($fp, '"att-privacy": "'.$the_att->privacy."", n");
fwrite($fp, '"att-def": '.$the_att->meta_def.", n");
fwrite($fp, '"att-range": '.$the_att->meta_range.", n");
fwrite($fp, '"att-legend": '.$the_att->meta_legend."n}");
} // write_att_data()
Is some other setting necessary so that the text is written as UTF-8 characters for a file, rather than as HTML encoded characters as though it were being displayed on a screen? Could it alternatively be that it is the input process that is somehow converting UTF-8 characters into HTML-encoded characters? When I look at the MIME-type of the files stored on my Mac, they do look correct.
Never write your own serialization function. Your code will inevitably generate invalid JSON.
JSON, by specification, is UTF-8. I’d imagine if you simply used PHP’s built-in
json_encode()
, everything would be fine.Your encoded entity problem is due to WordPress’s built-in functionality. I don’t know how to override it off the top of my head, but it’s been done before.
For those facing the same issues, there are major complications because of unpredictable and silent conversions of encodings, etc. But this blog entry is very helpful to me: https://www.stefan-wallin.se/utf-8-issues-in-wordpress-with-update_post_meta-and-json_encode/
The utf8_encode() function might help.