Improving WP_Query performance for multiple taxonomies

I have a site with several custom taxonomies, and have found that one of the slowest parts of the site is attempting to query with an OR across several of these at once. I’m using a WP_Query like so:

array(
  'tax_query' => array(
    'relation' => 'OR',
    array('taxonomy' => 'tax1', 'field' => 'slug', 'terms' => 'term1'),
    array('taxonomy' => 'tax2', 'field' => 'slug', 'terms' => 'term2'),
    array('taxonomy' => 'tax3', 'field' => 'slug', 'terms' => 'term3'),
    array('taxonomy' => 'tax4', 'field' => 'slug', 'terms' => 'term4'),
  )
)

The SQL it generates takes an unacceptable 6 seconds to run:

Read More
SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts  
INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) 
INNER JOIN wp_term_relationships AS tt1 ON (wp_posts.ID = tt1.object_id) 
INNER JOIN wp_term_relationships AS tt2 ON (wp_posts.ID = tt2.object_id) 
INNER JOIN wp_term_relationships AS tt3 ON (wp_posts.ID = tt3.object_id) 
WHERE 1=1 AND wp_posts.ID NOT IN (70) 
AND (wp_term_relationships.term_taxonomy_id IN (23) 
  OR tt1.term_taxonomy_id IN (5)
  OR tt2.term_taxonomy_id IN (11)
  OR tt3.term_taxonomy_id IN (10) ) 
AND (wp_posts.post_status = 'publish') 
GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 500

But this equivalent query takes a much nicer 0.29 seconds:

SELECT SQL_CALC_FOUND_ROWS  wp_posts.* FROM wp_posts
INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id)
WHERE 1=1 AND wp_posts.ID NOT IN (70)
AND (wp_term_relationships.term_taxonomy_id IN (23, 5, 11, 10)) 
AND (wp_posts.post_status = 'publish')
GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 500

Clearly the multiple joins are making it slower than it needs to be. The SQL doesn’t care that the terms come from different taxonomies, but WP_Query does because they’re looked up by slug. Is there any way to persuade WP_Query to generate something closer to the second one?

(Note the above has been anonymised to protect my client)

Related posts

Leave a Reply

2 comments

  1. I have a solution, but it’s a really ugly one. I’d love to hear a better one, but I’m not sure it’s possible.

    WP_Query::get_posts() calls parse_tax_query() twice: first near the start, then again just before getting the SQL from it. There’s no single hook that lets me intercept and adjust the value of $tax_query in time to adjust the SQL, so instead I had to do it in two parts.

    • An action on pre_get_posts, near the start of get_posts() recognises taxonomy queries with 'relation' => 'OR' and simplifies them so that they’ll only generate one join on wp_term_relationships. At the same time it stores the resolved list of IDs from all the taxonomies inside the ‘WP_Query’ object to use later.

    • A filter on posts_where_paged, much later in query_posts(), checks for a saved list of IDs and replaces the condition on the join.

    Here’s the code:

    add_action('pre_get_posts', 'wp_query__pre');
    function wp_query__pre ($wp_query) {
      if (!isset($wp_query->query['tax_query'])) return;
      if ($wp_query->query['tax_query']['relation'] != 'OR') return;
    
      $allterms = array();
      foreach ($wp_query->tax_query->queries as $query) {
        $tax = $query['taxonomy'];
        $terms = $query['terms'];
        $wp_query->tax_query->_transform_terms($terms, $query['taxonomy'], $query['field'], 'term_taxonomy_id');
        $allterms = array_merge($allterms, $terms);
      }
    
      $tax_query = array(array(
          'taxonomy' => $tax,
          'terms' => $terms,
          'operator' => 'IN',
          'include_children' => 0,
          'field' => 'term_taxonomy_id',
        ));
    
      $wp_query->query['tax_query'] = $tax_query;
      $wp_query->query_vars['tax_query'] = $tax_query;
      $wp_query->tax_query = new WP_Tax_Query($tax_query);
      $wp_query->saved_tax_terms = $allterms;
    }
    
    add_filter('posts_where_paged', 'wp_query__where', 10, 2);
    function wp_query__where ($where, $wp_query) {
      if (!empty($wp_query->saved_tax_terms)) {
        $terms = implode(", ", $wp_query->dft_tax_terms);
        $where = preg_replace("!term_taxonomy_id IN ([^)]*)!", "term_taxonomy_id IN ($terms)", $where);
      }
      return $where;
    }
    

    Note that the code hasn’t been tested thoroughly yet, and no doubt contains all manner of bugs. I probably won’t cope with any more complicated queries.

    The upside of this approach is that it doesn’t require the rest of your code to know anything about it. Just code a taxonomy query with OR and it’ll get picked up and optimised. It didn’t provide quite as big a speed boost as I was hoping, but it was a definite improvement.

    I’m wondering if the WordPress team shouldn’t include something like this in the core code.


    Update: As befits such an ugly hack, this is broken on WordPress 3.2. I’m looking into a fix.

  2. You can have WordPress use your more efficent query by using the posts_join and posts_where filter, something along the lines of:

    add_filter( 'posts_join', 'tax_posts_join', 10, 2 );
    add_filter( 'posts_where', 'tax_posts_where', 10, 2 );
    add_filter( 'posts_request', 'tax_posts_request' );
    
    function tax_posts_join( $sql, $wp_query ){
        if( $tax_ids = $wp_query->get('term_taxonomy_ids_in') )
            $sql .= " INNER JOIN wp_term_relationships ON ( wp_posts.ID = wp_term_relationships.object_id )";
    
        return $sql;
    }
    
    function tax_posts_where( $sql, $wp_query ){
        if( $tax_ids = $wp_query->get('term_taxonomy_ids_in') ){
            $tax_ids = implode( ', ', $tax_ids );
            $sql .= " AND ( wp_term_relationships.term_taxonomy_id IN (".$tax_ids.") ) ";
        }   
    
        return $sql;
    }
    
    function tax_posts_request( $sql ){
        //var_dump( $sql );
        return $sql;
    }
    
    
    $args = array(
        'term_taxonomy_ids_in' => array(23, 5, 11, 10)
    );
    
    $tax_posts = new WP_Query( $args );
    

    That can be cleaned up a bit, probably sanitize the taxonomy ids before you make the sql, you might need the posts_groupby filter as well, but should get you going in the right direction. http://codex.wordpress.org/Custom_Queries