encoding issues in drupal when importing from wordpress

I am currently moving blog posts from wordpress to drupal. however after moving it
some of the text is not being displayed correctly.

wordpress is displaying :
When it hasn’t (html code is <h2>When it hasn’t</h2>)

Read More

Drupal is displaying :
When it hasn’t (html code is <h2>When it hasn’t</h2>)

In the wordpress and drupal db the value is correct. The source is the same.
<h2>When it hasn’t</h2>

I did a search and found many options. None of them helped.
Below are the ones I have done and checked.

1) I double checked that utf-8 is the character encoing in drupal and wp.
I also made a simple test.php file to check nothing else was coming in the way
and it still did not display correctly.

2) I made sure when we take a mysqldump and upload to drupal utf-8
is used.

3) I also made sure the .php file is in utf-8 when saved.

4) I changed the encoding type in chrome for every option available and nothing
displayed it correctly.

5) I also used php functions to recode it but they did not work.

$value2="<h2>When it hasn’t</h2>";

$out = recode_string('..utf-8', $value2);
//output - When it hasnt

$out2= mb_convert_encoding($value2,'UTF-8', "UTF-8");
// output  - When it hasn’t


$out3= @iconv('UTF-8', 'utf-8', $value2);
// output - When it hasn’t

I have ran out of options now and I am stuck. Please help

Related posts

Leave a Reply

1 comment

  1. You say the text in both databases is correct, but actually this doesn’t mean too much: to viewing the content of a record you must use some client, and quite a few transformations may happen depending on how the text is rendered so you can read it.

    So only two things matters:

    1. the encoding of the column
    2. the encoding of the HTML page returned by Drupal

    Since your page outputs ’ (in CP1252 is xE2x80x99) for ’ (Unicode U+2019, UTF-8 is 0xE28099) I guess the column is indeed UTF-8, however there’s someone between the database and the browser who thinks the text is CP1252. This is what you have to check:

    • If using MySQL, the connection encoding must be UTF-8 so that what you have in your PHP script is UTF-8 text. You can use SET NAMES 'UTF-8'. Note that if you don’t need the Unicode set, you can even use CP1252: the only important thing is that you know the encoding, since PHP strings are just byte arrays.
    • Explicitely define the response encoding in the HTTP Content-Type header. I mean, configure Drupal to call header('Content-Type: text/html; charset=utf-8');
    • If the HTTP response encoding is different than the one used for the text retrieved from the db, transcode the query result accordingly