Right of the bat, let me say that this is not the same question as I posted the other day ( that is duplicate posts when trying to update a post using the wp_insert_post. ) Cause, in that question, the duplicate post I was referring to was being created as a revision. And turning the revision setting appropriately in the wp_config.php took care of it. With this question however, two ( sometimes three or four ) identical posts ( where one is not the revision of the other ) are created. And no matter what I have done to avoid this, it just fails. Sometimes, I get 100 dups, sometimes 500..
This is all part of a migration routine I wrote where a while loop goes thru 15,000 records from an ms-sql table and converts each record into a post_data array to be fed into the wp_insert_post API. Technically, since I’m dealing with 15000 recs in ms-sql, I should end up with 15000 posts in a fresh installed wp_posts table. Right? In my case, I sometimes get 16000, sometimes 290000. It’s never the same. Everytime is different. I’m almost to the point to re-write the code. But if the culprit itself the wp_insert_post and some internal wp process or wordpress cache that I do not know of or mysql server config and or and or and or… , even the new approach won’t work.
After some research, I’ve come to learm I’m not alone on this. But from what I read, I could not get a wrapper around it.
As I said above, I create the posts going thru a while loop.
Since the process takes a long time, I had to build a custom time-out module. As the code goes thru the iterations of the while loop, it marks its process into a time_out table, and siince I got a watch dog page that’s running in an iframe to keep an eye on the time_out table’s progress, even if the bottom frame ( the one that handles the ms-sql to wp_posts process ) times out, I know from what record to kick start the process.
This way I do not have to sit in front of the computer to keep clicking on ‘continue’. link.
Well, this trick works on all of my implementations that are severed by timeouts . I eventually end up getting the whole table is processed and the watch dog iframe stops refreshing when the number of records to be processed is equal to the number of records that have been processed in the last instance. So the time-out module works perfectly except….
Only and only when the destination table is wp_posts and when the the wp_insert_post API is involved, I get a problem.
After the first time-out, things start getting hay-wired.
From time to time, wp_insert_post ends up firing twice/thrice causing the same record to be inserted multiple times.
To remedy the situation, I’ve tried 3 different techniques including the use of custom fields, but no luck. I also tried inserting a unique key into the post itself to minimize the database activity. I found this is better cause it avoids the involvement of custom fields, and yet achieves the same goal. I put the source_key ( which is something like –{$db_name}.{$table_name}.{$recid_value}– ) in the post and at_insert_time , I just check the existence of that key ( using like
operator ) to see if that post was added previously. but even that fails… Wp_insert_post surprisingly creates double records. and up until now, I simply cannot pin-down the occurrence of the problem.
I read this, Prevent duplicate posts in wp_insert_post using custom fields But my techniques were simply alternatives to it.
Here is a cut-down code of how I do it…
while ($row = mysql_fetch_assoc($RS)) :
$source_recid = $row['source_recid'];
//source_recid is something like db.table.tablerecid
//example services.media.1223
$post_data['post_content'] = $row['some_field_thats_got_page_content'];
//append the source_recid to the post data in forms of an html comment for uniqueness purposes
$post_data['post_content'] = $post_data['post_content'] . "<!--{$source_recid}-->";
//do the other stuff... ( shortened here for the clarify purposes... )
....
$post_data['post_status'] = 'publish';
Insert_or_UpdatePost($dbh,$post_data,$source_recid,$post_id);
if ($post_id==0):
//log the error and move on
continue;
endif;
endwhile;
function Insert_or_UpdatePost($dbh,$post_data,$source_recid,&$post_id){
// this function first looks up the --db.table.tablerecid-- sequence
// across all wp_posts post_content field
// in determining if this record has been already INSERTed!
// and depending on the outcome, it does an insert or update
// and return the $post_id accordingly
// if the function determines there are no such records,
// then and only then, it runs the wp_insert_post!
// if the function determines that there is a post like that,
// it retrieves the post_id
// and then switches to operation
// to use the wp_update_post instead!
// with this approach, technically speaking,
// it should impossible to run wp_insert_post on an existing post!
// and yet, I still get duplicate posts...
// here we go;
$post_id_probed = blp_sql_getdbvals($dbh,"select id from wp_posts where post_content LIKE '%--{$source_recid}--%'");
if (blp_isnumber($post_id_probed)):
$post_id = $post_id_probed;
$post_data['ID'] = $post_id;
$post_id = wp_update_post( $post_data );
if ($post_id == 0):
//add error
return FALSE;
else:
update_post_meta($post_id, "wpcf-blp-migration-date", blp_now('mysql'));
return TRUE;
endif;
endif;
// if we make it this part, it means only one thing!
// there is no post for the db.table.tablerecid yet,
// so do the insert!
$post_id = wp_insert_post( $post_data );
if ($post_id == 0):
//add error
return FALSE;
else:
//add_post_meta($post_id, "wpcf-blp-migration-source", $source_recid,TRUE);
//no need for that anymore
return TRUE;
endif;
}
say
if no match found for tablerecid insert the post, else add post meta