How to escape characters in the_excerpt() from Wordpress loop in WordPress

I have a blog that’s not part of my main site’s script. I’ve pulled in the blog title, thumbnail picture, and excerpt just fine on the frontpage. However, if the blog text has apostrophes or the like the kicked out text gets screwed up on the frontend, resulting in strange characters in place of the apostrophes. I’ve searched high and low on how to fix this, but have come up empty.

It appears I need to use the esc_html() function, but I’m not sure how to do that with the_excerpt. I’m definitely not a PHP guy.

I have this code:

<?php while (have_posts()): the_post(); ?>
<h3><?php the_title(); ?></h3>
<img  style="float: left; padding: 13px 20px 0px 0px;" src="<? echo catch_that_image(); ?>" width="100">
<?php the_excerpt(); ?>
<p><a href="<?php the_permalink(); ?>">Read more...</a></p>
<?php endwhile; ?>

Does anyone know how to fix this so the excerpt’s text isn’t replacing apostrophes and quotes with strange characters?

Post Views: 2

3 comments

the_excerpt directly outputs content. Instead, use get_the_excerpt to return the content and put it in a variable, or give it to esc_html.

<?php echo esc_html(get_the_excerpt()); ?>

For most WordPress functions of the format the_X(), get_the_X() also exists.

That said, the_excerpt() normally produces valid output on its own, so you may want to verify that the character sets between your site and WordPress match. That would mean your site’s page should be provided as utf-8.

What that generally means is that you’re serving the page as something other than UTF-8, while WordPress is replacing certain characters with other versions of themselves (“smart quotes”) in UTF-8. It can also happen when you copy and paste text into a post with an odd encoding. Microsoft Word is a common source of that, since it likes to translate all your quotes into curly ones in a proprietary encoding. If you did copy and paste, try pasting into somewhere with no formatting first (Notepad is good) and re-copy from there. Otherwise, it’s WordPress that needs to be fixed. Note that it isn’t esc_html that you will want here regardless – that will just give you a different set of problems.

There are two approaches here: the first, and best, is to make sure you’re serving out pages with the right encoding on them. You can check that using the debugging features of your browser (“Page Info”, or the network panels of Firebug or the Web Inspector), or you can easily do it online by running your page through the W3 validator. It will tell you the encoding on the results page. If it isn’t UTF-8, there is your problem. You can probably fix that with PHP directly if you’re generating the page yourself like it sounds:

<?php header("Content-type: text/html; charset=UTF-8");?>

right at the top of the page. WordPress’s library code will usually do that itself anyway, so it may not be working for you. Some servers are set up not to allow overriding that way, in which case configuring it will vary depending on which server you’re using. For Apache, see the documentation for AddDefaultCharset for one way of doing it.
It’s possible that your server is configured not to let you override settings that way either, or you don’t want to have that encoding, in which case you need option two.

If the first option isn’t available, or you’d rather not do it, then there is another. We can force WordPress not to make those replacements at all. WordPress’s wptexturize() function does that conversion, which is applied as part of the filter chain for the_excerpt. You can remove it from the filter chain with this PHP code:

<?php remove_filter( "the_excerpt", "wptexturize"); ?>

You can put that into functions.php in your theme or into a basic plugin, but there’s also a plugin available which does it for you: wpuntexturize will disable smart character conversion everywhere. That does make things look a little uglier, but it should make everything work.

It’s also possible to apply just a small reverse transformation for your problem characters using WordPress’s filters system. Something like this:

function local_fix_quotes($in) {
    $in = str_replace("“", '"', $in);
    $in = str_replace("”", '"', $in);
    $in = str_replace("’", "'", $in);
    $in = str_replace("‘", "'", $in);
    $in = str_replace("–", "-", $in);
    $in = str_replace("…", "...", $in);
    $in = str_replace("xC2xA0",' ',$in);
    $in = str_replace("xE2x80x99","'",$in);
    $in = str_replace("xE2x80x9C",'"',$in);
    $in = str_replace("xE2x80x9D",'"',$in);
    $in = str_replace("xE2x80x93",'-',$in);
    return $in;
}
add_filter( "the_excerpt", "local_fix_quotes" );

in a plugin or functions.php will untransform smart quotes and dashes into their plain-ASCII equivalents.

Any of these code options can go straight into your PHP page if you’re generating it outside of WordPress, so long as they come after you include the WordPress library code. When you do that, you have to require wp-blog-header right at the top of the page, before any other code output. If the page has already begun output WordPress won’t be able to change the encoding that’s already been sent. The top of the page might look like this:

<?php 
define('WP_USE_THEMES', false);
require_once('./wp-blog-header.php');
remove_filter( "the_excerpt", "wptexturize");
?>
<html>

If wp-blog-header wasn’t required at the top of the page, make sure it’s there first and see whether the problem resolves itself. If your page deliberately has a different encoding it doesn’t matter so much, and you’re left with disabling the character replacement regardless. For pages generated within WordPress, the plugin approach is best in that case.

Sure you’re working with the same character set encoding in each page’s header?:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

or calling blog-header on the non-Wordpress pages?

<?php require('/the/path/to/your/wp-blog-header.php'); ?>

re: http://codex.wordpress.org/Integrating_WordPress_with_Your_Website

How to escape characters in the_excerpt() from WordPress loop

Leave a Reply Cancel reply

3 comments

Social Network

Related posts

Leave a Reply Cancel reply

3 comments

Social Network