How to get all posts (in chunks) via XML-RPC?

I would like to retrieve all posts of a blog via the XML-RPC API of WordPress.

There are the methods of blogger.getRecentPosts and metaWeblog.getRecentPosts which – given a sufficiently high value as number of posts (or -1) in theory should return all posts.

Read More

However, this does not work for very large blogs or very weak servers that cannot hold the whole blog in memory. In that case, these functions will not return anything at best or throw an error into the response XML.

A solution would be to retrieve smaller chunks of, e.g. 50, posts at a time and put it all together on the receiving side. For this to work one would need to specify an offset for the posts to get. I was not able to find a way to specify such an offset in the documented API.

Is there any way to make this work, either by specifying an offset or by using other than the methods mentioned above?

I’m not looking for a description on how to write a plugin or modify WordPress itself in any way. I can do that, sure, but I’m talking about authorized retrieval of data of arbitrary WordPress blogs.

Edit: I’ve opened a trac ticket at WordPress with a suggestion for solution: http://core.trac.wordpress.org/ticket/16316

Related posts

Leave a Reply

5 comments

  1. Let me apologize for the initial question I had about your motives. I see a lot of “how can I remotely retrieve all posts from another blog” questions and immediately assume there is nefarious intent because, 9 times out of 10, there is. That said, your purposes seem very straight-forward and respectable.

    Currently, there is no way to “chunk” the XML return of any of the three requests you’ve mentioned. When I got up this morning, though, I saw you’ve proposed this as a feature enhancement through Trac. This definitely won’t make it in to WordPress 3.1, so you’ll likely be waiting a few months (or longer) before any submitted patches make it into core. But this is a good start.

    In the mean time, remember that the XML-RPC API is extensible. While there is no way to receive “chunks” in the existing API, you can always add your own method. This is actually the best way to get a patch in core – create your own method, make sure it works, and submit the patch back to Trac.

    My guess is that your method would be very similar to metaWeblog.getRecentPosts, but would be named a bit better … perhaps wp.getPagedPosts. You could accept all the same parameters, but add one: “pagenumber”. This way you could set the request to return 50 posts at a time and progressively walk through the collection.

    To add your method, you hook into the xmlrpc_methods filter:

    function xml_add_method( $methods ) {
        $methods['wp.getPagedPosts'] = 'wp_getPagedPosts';
        return $methods;
    }
    add_filter( 'xmlrpc_methods', 'xml_add_method');
    

    Then add your callback function:

    function wp_getPagedPosts($args) {
    
        // $this->escape($args);    //<-- This is called by native XML-RPC methods to sanitize passed arrays for the database.
    
        $blog_ID     = (int) $args[0];
        $username  = $args[1];
        $password   = $args[2];
        if ( isset( $args[3] ) )
            $query = array( 'numberposts' => absint( $args[3] ) );
        else
            $query = array();
    
        if ( !$user = $this->login($username, $password) )
            return $this->error;
    
        do_action('xmlrpc_call', 'wp.getPagedPosts');
    
        //... get a list of posts and generate your XML-RPC return ...
    
    }
    

    Remember, this is code you’d place in an external plug-in file or a functions.php file with your theme to support the additional XML-RPC request. There aren’t any existing methods to handle this, so you’re stuck with writing your own. But if you do it once, and do it well, and submit it back to Trac … it could become Core, then you wouldn’t have to do it again.

  2. It’s easy. Just use metaWeblog.getRecentPosts or mt.getRecentPostTitles and set limit to PHP_INT_MAX. If you set it to 0 it returns a maximum of posts you set to display on homepage (10 usually). My Wp Remote Control Library does this with great ease. See the Basic Snippets.

    // Getting all posts as full or light items
    $all_posts = $wpapi->getRecentPosts(PHP_INT_MAX);
    $all_post_titles = $wpapi->getRecentPostsList(PHP_INT_MAX);
    

    Regards.

  3. A simple way of grabbing all posts is to first try running the getRecentPosts method and retrieve only 1 post (the most recent is what will get returned) and using this post’s ID to simply loop back and grab each post consecutively with the getPost method. if you’re having issues with the amount of data being retrieved in one call then this should solve your problem. You could even alter this to grab chunks by changing your $count to 5 or 10 or what-have-you.

    Here’s a severely stripped down example (assuming you are familiar with the two methods in PHP, hopefully you get the idea..):

    <?php
    include('your_xmlrpc_functions_for_getRecentPosts_and_getPosts.php');
    $count = 1;
    $dataArray = getRecentPosts($appkey, $blogid, $user, $pass, $count);
    $startID = $dataArray[0][postid];
    for($i=$startID;$i>0;$i--) {
     $postInfoArray[] = getPost($appkey, $blogid, $user, $pass, $i);
     //add in whatever other functionality for each post here (maybe a time delay or something)
    }
    ?>
    

    Hope this helps someone 🙂