Non-UTF characters break RSS feed

I have nasty issue occuring every so often where sometime a guest blogger would inadvertently put a non-UTF character or an unbalanced HTML tag into a post, which will then break the RSS feed and will result in FeedBurner not being able to send it to subscribers via email.

Is there a technological way to avoid this kind of issues?

Read More

Thanks!

Related posts

Leave a Reply

2 comments

  1. copy this code in a php file, copy this file in your plugins-folder and after this activate it on the backend of WordPress. I hope this helps, but i dont test it, write it on scratch.

    <?php
    /**
     * Plugin Name: Non-UTF characters in RSS feed
     * Plugin URI:  http://wordpress.stackexchange.com/questions/37845/non-utf-characters-break-rss-feed/
     * Description: Filter content for unicode characters
     * Version:  1.0.0
     * Author:      Frank Bültge
     * Author URI:  http://bueltge.de
     * License:     GPLv3
     */
    
    // This file is not called from WordPress. We don't like that.
    ! defined( 'ABSPATH' ) and exit;
    
    foreach ( array( 'the_content_rss', 'the_excerpt_rss', 'the_title_rss', 'comment_text_rss' ) as $filter )
        remove_filter( $filter, 'filter_non_utf8_chars', 0 );
    
    function filter_non_utf8_chars( $content ) {
    
        return htmlentities2( $content );
    }
    ?>
    
  2. If this is just happening in the RSS content area, try filtering out the chars by using the_content_rss filter:

    add_filter('the_content_rss', 'filter_non_utf8_chars', 0, 1);
    function filter_non_utf8_chars($content){
        $content = preg_replace('/[^(x20-x7F)x0A]*/','', $content);
        return $content;
    }
    

    Here are the filters related to RSS feeds:

    1. the_excerpt_rss
    2. the_content_rss
    3. the_title_rss
    4. comment_text_rss

    Hope this helps!