What’s the proper way to find and remove duplicate images from posts and the media library?

I just exported a largish WP blog from MediaTemple to PHPFog.

I used the standard WordPress export and import plugins.

Read More

For some unknown reason all of my media assets have been duplicated. I now have twice as many images per post.

If an original file was called “Lot-44-Warrens.jpg” it now has a duplicate called “Lot-44-Warrens1.jpg” Both files are attached to the same post.

I now have many duplicate images across about 250+ posts.

So my question is how do I remove said duplicates from the media library and from the posts?

I tried to search the media library with “*1.jpg”, but it didn’t work.

Looking for a neat solution that doesn’t mean removing each dupe manually.

Perhaps there is a MySQL query I can run to remove the dupes from the library and the posts?

The site in question is: http://igrealty.phpfogapp.com/ .

Related posts

Leave a Reply

4 comments

  1. combining the two answer on this page, I found this worked.

    $args = new WP_Query(array(
      'post_type' => 'post',
      'posts_per_page' => -1
    ));
    
    $loop = new WP_Query($args);
    
    while($loop->have_posts()) {
      the_post();
      $args2 = array(
        'order' => 'ASC',
        'post_type' => 'attachment',
        'post_parent' => $post->ID,
        'post_mime_type' => 'image');
        $attachments = get_posts($args2);
        if($attachments) {
          foreach ($attachments as $img_post) {
            if( ((strpos($img_post->guid, '1.jpg')!== false) || (strpos($img_post->guid, '1.gif')!== false) || (strpos($img_post->guid, '1.png')!== false))){
              $stuff = $img_post->guid;
              wp_delete_attachment($img_post->ID);
            } 
          }
        }
    } wp_reset_postdata();
    
  2. Use a run-once script to clean it up. Just an outline, no code:

    1. Get all posts. See get_posts( array ( 'numberposts' => -1 ) )
    2. For each post get all attachments. See get_children( array ( 'post_type' => 'attachment', 'numberposts' => -1 ) )
    3. For each attachment get the attachment URL. See wp_get_attachment_url()
    4. If you find the attachment URL in the parent post’s content ($post->post_content):
      • If there is another attachment URL with the same file name plus the 1 and
      • both are part of the post content then
      • remove the second image first then
      • use wp_delete_attachment() to delete the physical file. This will remove all meta data and all associations in other posts too. It is the best way to remove attached files (imho).

    This may take a while. Test it on a local copy of your site. Maybe you should run the process in steps of 50 posts ('numberposts' => 50).

  3. This script will grab all of the attachments in the database, compare the file to one another through md5 and if it finds a duplicate and it has a 1 at the end of the file name it will remove the image:

    require('wp-load.php');
    
    global $wpdb;
    
    $img_posts = $wpdb->get_results("SELECT * FROM {$wpdb->prefix}posts WHERE post_type like 'attachment'");
    
    $img_md5s = array();
    
    foreach($img_posts as $img_post){
      $single_img_md5 = md5_file($img_post->guid);
    
      if(in_array($single_img_md5, $img_md5s) && ((strpos($img_post->guid, '1.jpg')!== false) || (strpos($img_post->guid, '1.gif')!== false) || (strpos($img_post->guid, '1.png')!== false))){
        wp_delete_attachment($img_post->ID);
    
      }else{
        $img_md5s[] = $single_img_md5;
      }
    } 
    

    Just place it in a file in your root directory.

  4. I learn’t a valuable lesson yesterday, if an application does not provide you with adequate functions for finding and removing assets from a database, and you’re trying to find duplicates across multiple, often unique, fields, and you’re unsure how to create complex MySQL queries; then the best bet is to go back to basics.

    In the end, I exported the tables with dupes into Excel, filtered them by creating my own “hash” of significant fields ( doing this in MySQL was complex and crashed the server a few times ) and pruned the data set so that I created a list of IDs that I was absolutely sure I wanted to remove. I then built a much more simple MySQL query to delete each row by ID.

    This method worked great because I was able to take things slowly and consider each Excel filter I applied. This way I was much more confident I was deleting the correct records. I also have an accurate record in Excel of exactly what I did delete.