Removing any and all inline styles from the_content()

For one of my current projects, I had to transfer blogposts from an old WordPress site to my project.

Things went smoothly until I’ve seen that all the posts were copy pasted from Word, leaving this before pretty much every paragraph:

Read More
<span style="font-size: medium; font-family: georgia,palatino;">

And at some places things like these:

<p style="text-align: justify;">
<p style="text-align: justify;"><span style="font-size: medium; font-family: georgia,palatino;"><strong><span style="color: #000000;">

So because I don’t have the 40 hours (even less the patience) to just go into every post (there’s about 100) and remove those unwanted tags, I’m looking for a filter that would just remove all style (except maybe if it contains text-decoration:underline) elements before outputting the_content()

Is there such a thing?

Related posts

Leave a Reply

4 comments

  1. If we want to remove all inline styles, then just simply need to add the following code in functions.php.

    add_filter('the_content', function( $content ){
        //--Remove all inline styles--
        $content = preg_replace('/ style=("|')(.*?)("|')/','',$content);
        return $content;
    }, 20);
    
  2. Just add this to your functions.php.

    Note: This filter works at the time of saving/updating the post.

    add_filter( 'wp_insert_post_data' , 'filter_post_data' , '99', 2 );
    
    function filter_post_data( $data , $postarr ) {
    
        $content = $data['post_content'];
    
        $content = preg_replace('#<p.*?>(.*?)</p>#i', '<p>1</p>', $content);
        $content = preg_replace('#<span.*?>(.*?)</span>#i', '<span>1</span>', $content);
        $content = preg_replace('#<ol.*?>(.*?)</ol>#i', '<ol>1</ol>', $content);
        $content = preg_replace('#<ul.*?>(.*?)</ul>#i', '<ul>1</ul>', $content);
        $content = preg_replace('#<li.*?>(.*?)</li>#i', '<li>1</li>', $content);
    
        $data['post_content'] = $content;
    
        return $data;
    }
    

    Note: This filter works at the time when function the_content() is executed.

    add_filter( 'the_content', 'the_content_filter', 20 );
    
    function the_content_filter( $content ) {
        $content = preg_replace('#<p.*?>(.*?)</p>#i', '<p>1</p>', $content);
        $content = preg_replace('#<span.*?>(.*?)</span>#i', '<span>1</span>', $content);
        $content = preg_replace('#<ol.*?>(.*?)</ol>#i', '<ol>1</ol>', $content);
        $content = preg_replace('#<ul.*?>(.*?)</ul>#i', '<ul>1</ul>', $content);
        $content = preg_replace('#<li.*?>(.*?)</li>#i', '<li>1</li>', $content);
        return $content;
    }
    
  3. I tried the method above with the saving/updating but didn’t worked for me so I went from another approach. I exported the whole wp_posts table, opened it in Sublime and did a regex replace. I used style="*.*?" to find all cases and replaced them with emptyness. Then droped the old table’s content and imported the new one.

    If any one try this method – please make sure you have a clear back up in case there are some other post types in the wp_post table and the things got bit messy.