Better way to remove HTML syntax from all content

I have a bunch of articles imported via a very old Joomla 1.0 installation. The content of these posts have a lot of unwanted inline html. I was able to clean all with something like this (I made a template with this inside, then I opened it):

<?php 
$tochange = get_posts('post_type=post&numberposts=-1');    
foreach ($tochange as $post):
    setup_postdata($post);

    $changed = array();
    $changed['ID'] = $post->ID;
    $changed['post_content'] = strip_tags($post->post_content, '<img><a>');
    print_r($post->ID); 
    echo '<br />';
    $out = wp_update_post($changed);
    echo 'changed:'.$out.'<br />';

    unset($changed);

endforeach;
?>

But it seems a little bloated, considering that makes a loop for every post.
Any suggestions?

Related posts

Leave a Reply

2 comments

  1. If you just need to change the post content, you can avoid the overhead of get_posts/WP_Query by directly querying the database:

    global $wpdb;
    
    $results = $wpdb->get_results("SELECT ID, post_content FROM {$wpdb->posts}");
    
    $total = count($results); 
    $changed = 0;
    
    foreach($results as $entry){
    
      $new_content = strip_tags($entry->post_content, '<img><a>');
    
      if($entry->post_content !== $new_content){
    
        $wpdb->query($wpdb->prepare(
                     "UPDATE {$wpdb->posts} SET post_content = %s WHERE ID = %s)", 
                        $new_content, $entry->ID));
    
        $changed++;
      }
    
    }
    
    printf("Changed %d out of %d posts", $changed, $total);
    

    (back-up db first)

  2. To change every post, you need to loop through every post. Assuming your code checks out, which it seems it will with a cursory glance, that’s more or less how I’d do it.

    You may be able to do it with a MySQL query or export your wp_posts table into a sql file and run a find/replace on the file than re-import the table. But, I cannot stress this enough, BACKUP YOUR DB FIRST

    In hindsight, it would be quicker to do the latter option as the php may load your server for a bit.

    HeidiSQL has the ability to export the table and add all the assorted query language to re-import it with DROP IF EXISTS and CREATE IF NOT EXISTS clauses so that your a simple CTRL+H operation away from pretty formatting.