How to extend WP_Query to include custom table in query?

I have been days over this issue now. Initially it was, how to store a user’s follower data in database, for which I got couple of nice recommendations here at WordPress Answers. After, following the recommendations I have added a new table like this:

id  leader_id   follower_id
1   2           4
2   3           10
3   2           10

In the table above, the first row has a user with an ID of 2 who is being followed by a user with an ID of 4. In the second row, a user with an ID of 3 is being followed by an user with an ID of 10. The same logic applies for the third row.

Read More

Now, essentially I want to extend WP_Query so that I can limit posts fetched to that of, only by the leader(s) of a user. So, taking the above table into consideration, if I were to pass the user ID 10 to WP_Query the results should only contain posts by user ID 2 and user ID 3.

I have searched a lot trying to find an answer. Nor, I have seen any tutorial to help me understand how to extend the WP_Query class. I have seen Mike Schinkel’s answers (extending WP_Query) to similar questions, but I have really not understood how to apply it to my needs. It would be great if someone could help me out with this.

Links to Mike’s answer’s as requested:
Link 1, Link 2

Related posts

Leave a Reply

5 comments

  1. Important disclaimer: the proper way to do this is NOT to modify your
    table structure, but to use wp_usermeta. Then you will not need to
    create any custom SQL to query your posts (though you’ll still need
    some custom SQL to get a list of everyone that reports to a particular
    supervisor – in the Admin section, for instance). However, since the
    OP asked about writing custom SQL, here is the current best practice
    for injecting custom SQL into an existing WordPress Query.

    If you’re doing complex joins, you can’t just use the posts_where filter, because you’ll need to modify the join, the select, and possibly the group by or order by sections of the query as well.

    Your best bet is to use the ‘posts_clauses’ filter. This is a highly useful filter (that shouldn’t be abused!) that allows you to append / modify the various portions of the SQL that is generated automatically by the many many lines of code within WordPress core. The filter callback signature is:
    function posts_clauses_filter_cb( $clauses, $query_object ){ } and it expects you to return $clauses.

    The Clauses

    $clauses is an array that contains the following keys; each key is an SQL string that will be directly used in the final SQL statement sent to the database:

    • where
    • groupby
    • join
    • orderby
    • distinct
    • fields
    • limits

    If you’re adding a table to the database (only do this if you absolutely can’t leverage post_meta, user_meta or taxonomies) you’ll probably need to touch more than one of these clauses, for example, the fields (the “SELECT” portion of the SQL statement), the join (all your tables, other than the one in your “FROM” clause), and maybe the orderby.

    Modifying the Clauses

    The best way to do this is to subreference the relevant key from the $clauses array you got from the filter:

    $join = &$clauses['join'];
    

    Now, if you modify $join, you’ll actually be directly modifying $clauses['join'] so the changes will be in $clauses when you return it.

    Preserving the Original Clauses

    Chances are (no, seriously, listen up) you will want to preserve the existing SQL that WordPress generated for you. If not, you should probably look at the posts_request filter instead – that is the complete mySQL query just before it’s sent off to the database, so you can totally clobber it with your own. Why would you want to do this? You probably don’t.

    So, in order to preserve the existing SQL in the clauses, remember to append to the clauses, not assign to them (ie: use $join .= ' {NEW SQL STUFF}'; not $join = '{CLOBBER SQL STUFF}';. Note that because each element of the $clauses array is a string, if you want to append to it, you’ll probably want to insert a space before any other character tokens, otherwise you’ll probably create some SQL syntax error.

    You can just assume there will always be something in each of the clauses, and so remember to start each new string with a space, as in: $join .= ' my_table, or, you can always add a little line that only adds a space if you need to:

    $join = &$clauses['join'];
    if (! empty( $join ) ) $join .= ' ';
    $join .= "JOIN my_table... "; // <-- note the space at the end
    $join .= "JOIN my_other_table... ";
    
    
    return $clauses;
    

    That’s a stylistic thing more than anything else. The important bit to remember is: always leave a space BEFORE your string if you’re appending to a clause that already has some SQL in it!

    Putting it together

    The first rule of WordPress development is to try to use as much core functionality as you can. This is the best way to future proof your work. Suppose the core team decides that WordPress will now be using SQLite or Oracle or some other database language. Any hand-written mySQL may become invalid and break your plugin or theme! Better to let WP generate as much SQL as possible on its own, and just add the bits you need.

    So first order of business is leveraging WP_Query to generate as much of your base query as possible. The exact method we use to do this depends largely on where this list of posts is supposed to appear. If it’s a sub-section of the page (not your main query) you would use get_posts(); if it’s the main query, I suppose you could use query_posts() and be done with it, but the proper way to do it is to intercept the main query before it hits the database (and consumes server cycles) so use the request filter.

    Okay, so you’ve generated your query and the SQL is about to be created. Well, in fact, it has been created, just not sent to the database. By using the posts_clauses filter, you’re going to add your employee relationships table into the mix. Let’s call this table {$wpdb->prefix} . ‘user_relationship’, and it’s an intersection table. (By the way, I recommend that you genericize this table structure and turn it into a proper intersection table with the following fields: ‘relationship_id’, ‘user_id’, ‘related_user_id’, ‘relationship_type’; this is much more flexible and powerful… but I digress).

    If I understand what you want to do, you want to pass a Leader’s ID and then see only the posts by that Leader’s Followers. I hope I got that right. If it’s not right, you’ll have to take what I say and adapt it to your needs. I’ll stick with your table structure: we have a leader_id and a follower_id. So the JOIN will be on {$wpdb->posts}.post_author as a foreign key to the ‘follower_id’ on your ‘user_relationship’ table.

    add_filter( 'posts_clauses', 'filter_by_leader_id', 10, 2 ); // we need the 2 because we want to get all the arguments
    
    function filter_by_leader_id( $clauses, $query_object ){
      // I don't know how you intend to pass the leader_id, so let's just assume it's a global
      global $leader_id;
    
      // In this example I only want to affect a query on the home page.
      // This is where the $query_object is used, to help us avoid affecting
      // ALL queries (since ALL queries pass through this filter)
      if ( $query_object->is_home() ){
        // Now, let's add your table into the SQL
        $join = &$clauses['join'];
        if (! empty( $join ) ) $join .= ' '; // add a space only if we have to (for bonus marks!)
        $join .= "JOIN {$wpdb->prefix}employee_relationship EMP_R ON EMP_R.follower_id = {$wpdb->posts}.author_id";
    
        // And make sure we add it to our selection criteria
        $where = &$clauses['where'];
        // Regardless, you always start with AND, because there's always a '1=1' statement as the first statement of the WHERE clause that's added in by WP/
        // Just don't forget the leading space!
        $where .= " AND EMP_R.leader_id={$leader_id}"; // assuming $leader_id is always (int)
    
        // And I assume you'll want the posts "grouped" by user id, so let's modify the groupby clause
        $groupby = &$clauses['groupby'];
        // We need to prepend, so...
        if (! empty( $groupby ) ) $groupby = ' ' . $groupby; // For the show-offs
        $groupby = "{$wpdb->posts}.post_author" . $groupby;
      }
    
      // Regardless, we need to return our clauses...
      return $clauses;
    }
    
  2. I am answering this question extremely late and my apologies for the same. I had been way too busy with deadlines to attend to this.

    A big thanks to @m0r7if3r and @kaiser in providing the base solutions that I could extend and implement in my application. This answer provides details on my adaptation of the solutions offered by @m0r7if3r and @kaiser.

    First, let me explain why this question was asked in the first place. From the question and the comments thereof one could gather that I am trying to get WP_Query to pull posts by all users (leaders) that a given user (follower) follows. The relationship between the follower and the leader is stored in a custom table follow. The most common solution to this problem is to pull the user ID’s of the all leaders of a follower from the follow table and place it in an array. See below:

    global $wpdb;
    $results = $wpdb->get_results($wpdb->prepare('SELECT leader_id FROM cs_follow WHERE follower_id = %s', $user_id));
    
    foreach($results as $result)
        $leaders[] = $result->leader_id;
    

    Once you have the array of leaders you could pass it as an argument to WP_Query. See below:

    if (isset($leaders)) $authors = implode(',', $leaders); // Necessary as authors argument of WP_Query only accepts string containing post author ID's seperated by commas
    
    $args = array(
        'post_type'         => 'post',
        'posts_per_page'    => 10,
        'author'            => $authors
    );
    
    $wp_query = new WP_Query( $args );
    
    // Normal WordPress loop continues
    

    The above solution is the simplest way of achieving my desired results. However, it is non-scalable. The moment you have a follower following tens and thousands of leaders the resultant array of leader ID’s would get extremely large and force your WordPress site to use 100MB – 250MB of memory on each page load and eventually crash the site. The solution to the problem is to run SQL query directly on the database and fetch relevant posts. That’s when @m0r7if3r’s solution came to rescue. Following @kaiser’s recommendation I set out to test both implementations. I imported around 47K users from a CSV file to register them on a fresh test install of WordPress. The install was running Twenty Eleven theme. Following this I ran a for loop to make about 50 users follow every other user. The difference in query time for both @kaiser and @m0r7if3r’s solution was staggering. @kaiser’s solution normally took about 2 to 5 secs for each query. The variation I presume happens as WordPress caches queries for later use. On the other hand @m0r7if3r’s solution demonstrated query time of 0.02 ms on average. For testing both solutions I had indexing ON for the leader_id column. Without indexing there was a dramatic increase in the query time.

    The memory usage when using array based solution stood around 100-150 MB and dropped to 20 MB when running a direct SQL.

    I hit a bump with @m0r7if3r’s solution when I needed to pass the follower ID to the posts_where filter function. Atleast, as per my knowledge WordPress allows no means of passing a variable to filer functions. You can use Global variables though, but I wanted to avoid globals. I ended up extending WP_Query to finally address the issue. So here is the final solution I implemented (based on @m0r7if3r ‘s solution).

    class WP_Query_Posts_by_Leader extends WP_Query {
        var $follower_id;
    
        function __construct($args=array()) {
            if(!empty($args['follower_id'])) {
                $this->follower_id = $args['follower_id'];
                add_filter('posts_where', array($this, 'posts_where'));
            }
    
            parent::query($args);
        }
    
        function posts_where($where) {
            global $wpdb;
            $table_name = $wpdb->prefix . 'follow';
            $where .= $wpdb->prepare(" AND post_author IN (SELECT leader_id FROM " . $table_name . " WHERE follower_id = %d )", $this->follower_id);
            return $where;
        }
    }
    
    
    $args = array(
        'post_type'         => 'post',
        'posts_per_page'    => 10,
        'follower_id'       => $follower_id
    );
    
    $wp_query = new WP_Query_Posts_by_Leader( $args );
    

    Note: I eventually tried the above solution with 1.2 million entries in the follow table. The average query time stood around 0.060 ms.

  3. You can do this with an entirely SQL solution using the posts_where filter. Here’s an example of that:

    if( some condition ) 
        add_filter( 'posts_where', 'wpse50305_leader_where' );
        // lol, question id is the same forward and backward
    
    function wpse50305_leader_where( $where ) {
        $where .= $GLOBALS['wpdb']->prepare( ' AND post_author '.
            'IN ( '.
                'SELECT leader_id '.
                'FROM custom_table_name '.
                'WHERE follower_id = %s'.
            ' ) ', $follower_id );
        return $where;
    }
    

    I think there may be a way to do this with JOIN as well, but I cannot come up with it. I will keep playing with it and update the answer if I get it.

    Alternately, as @kaiser suggested, you can split it to two parts: getting the leaders and doing the query. I have a feeling this might be less efficient, but it’s certainly the more understandable way to go. You would have to test the efficiency for yourself to determine which method is better, as nested SQL queries can get quite slow.

    FROM THE COMMENTS:

    You should put the function in your functions.php and do the add_filter() right before the query() method of WP_Query is called. Immediately following that, you should remove_filter() so that it does not affect the other queries.

  4. Template Tag

    Just place both functions in your functions.php file. Then adjust the 1st function and add your custom table name. Then you need some try/error to get rid of the current user ID inside the resulting array (see comment).

    /**
     * Get "Leaders" of the current user
     * @param int $user_id The current users ID
     * @return array $query The leaders
     */
    function wpse50305_get_leaders( $user_id )
    {
        global $wpdb;
    
        return $wpdb->query( $wpdb->prepare(
            "
                SELECT `leader_id`, `follower_id`
                FROM %s
                    WHERE `follower_id` = %s
                ORDERBY `leader_id` ASC
            ",
            // Edit the table name
            "{$wpdb->prefix}custom_table_name"
            $user_id
        ) );
    }
    
    /**
     * Get posts array that contain posts by 
     * "Leaders" the current user is following
     * @return array $posts Posts that are by the current "Leader
     */
    function wpse50305_list_posts_by_leader()
    {
        get_currentuserinfo();
        global $current_user;
    
        $user_id = $current_user->ID;
    
        $leaders = wpse5035_get_leaders( $user_id );
        // could be that you need to loop over the $leaders
        // and get rid of the follower ids
    
        return get_posts( array(
            'author' => implode( ",", $leaders )
        ) );
    }
    

    Inside the template

    Here you can do whatever you want with your results.

    foreach ( wpse50305_list_posts_by_leader() as $post )
    {
        // do something with $post
    }
    

    NOTE We don´t have testdata, etc. so the above is a little bit of a guessing game. Make sure that you edit this answer with what worked for you, so we have a satisfying result for later readers. I´ll approve the edit in case you got too low rep. You then can also delete this note. Thanks.

  5. Note: This answer here is to avoid extended discussion in the comments

    1. Here´s the OPs Code from the comments, to add the first set of test users. I has to be modified to a real world example.

      for ( $j = 2; $j <= 52; $j++ ) 
      {
          for ( $i = ($j + 1); $i <= 47000; $i++ )
          {
              $rows_affected = $wpdb->insert( $table_name, array( 'leader_id' => $i, 'follower_id' => $j ) );
          }
      }
      

      OP About Test For this I added about 47K users from a csv file. Later, ran a for loop to make the first 45 users follow every other user.

      • This resulted in 3,704,951 records saved to my custom table.
      • Initially, @m0r7if3r ‘s solution gave me query time of 95 secs, which went down to 0.020 ms after turning ON indexing on leader_id column. The total PHP memory consumed was around 20MB.
      • On the other hand, your solution took about 2 to 5 secs for query with indexing ON. The total PHP memory consumed was around 117MB.
    2. My answer to this ↑ test:

      a more “real life” test: Let every user follow a $leader_amount = rand( 0, 5 ); and then add the number of $leader_amount x $random_ids = rand( 0, 47000 ); to each user. So far the thing we know is: My solution would be extremly bad if a user is following each other user. Further: You´ll have show how you did the test and where exactly you added the timers.

      I also have to state that the ↑ above time tracking can´t be really measured, as it would also take the time to compute the loop together. Better would be to loop through the resulting set of IDs in a second loop.

    further process here