Write UTF-8 JSON file from PHP code in WordPress plugin

I’m writing a WordPress plugin that needs to be able to write and read complex data encoded as JSON which can contain UTF-8 encoded text. I’ve had problems reading the file (I get PHP parse errors), but I now suspect that this is because the data is not actually encoded as UTF-8 (as I expected) but as HTML-encoded entities.

The functions that open the output buffer and write into it look like this — am I missing something??

Read More
public function createUTFOutput($filename, $json)
{
        // Tells the browser to expect a json file and bring up the save dialog in the browser
    header('Pragma: public');
    header('Expires: 0');
    header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
    header('Cache-Control: private', false);

    if ($json)
        header('Content-Type: text/plain; charset=utf-8');
    else
        header('Content-Type: text/csv; charset=utf-8');
    header('Content-Disposition: attachment; filename="'.$filename.'";');

        // This opens up the output buffer as a "file"
    $fp = fopen('php://output', 'w');

        // Hack to write as UTF-8 format
    fwrite($fp, pack("CCC",0xef,0xbb,0xbf));
    return $fp;
} // createUTFOutput()

    // PURPOSE: Write out data about Attribute $the_att to file $fp
public function write_att_data($fp, $the_att)
{
        // Create header to indicate Attribute record
    fwrite($fp, '{"type": "Attribute", "att-id": "'.$the_att->id.'", '."n");
    fwrite($fp, '"att-privacy": "'.$the_att->privacy."", n");
    fwrite($fp, '"att-def": '.$the_att->meta_def.", n");
    fwrite($fp, '"att-range": '.$the_att->meta_range.", n");
    fwrite($fp, '"att-legend": '.$the_att->meta_legend."n}");
} // write_att_data()

Is some other setting necessary so that the text is written as UTF-8 characters for a file, rather than as HTML encoded characters as though it were being displayed on a screen? Could it alternatively be that it is the input process that is somehow converting UTF-8 characters into HTML-encoded characters? When I look at the MIME-type of the files stored on my Mac, they do look correct.

Related posts

3 comments

  1. Never write your own serialization function. Your code will inevitably generate invalid JSON.

    JSON, by specification, is UTF-8. I’d imagine if you simply used PHP’s built-in json_encode(), everything would be fine.

    Your encoded entity problem is due to WordPress’s built-in functionality. I don’t know how to override it off the top of my head, but it’s been done before.

Comments are closed.