I have a WordPress-based site which catalogs scholarly articles for a regional science journal.
Simply said, the system has several thousand “Posts”, and each has a “pub_type” Taxonomy in which only 1 Term is selected: “Manuscript”, or something else.
Each Post also has a variety of other Taxonomies/Terms it is related to.
Objective: Get a list of terms for a particular Taxonomy. For each term, Count the number of Posts related to it, AND determine how many of those Posts have “Manuscript” set within the “pub_type” Taxonomy.
Current Query:
SELECT term_id, term_id as term_id_b, name, slug,
( SELECT COUNT(id) FROM wp_posts WHERE id IN
( SELECT object_id FROM wp_term_relationships WHERE term_taxonomy_id IN
( SELECT term_taxonomy_id FROM wp_term_taxonomy WHERE term_id = term_id_b )
) AND post_status = "publish" ) as count,
( SELECT COUNT(id) FROM wp_posts WHERE id IN
( SELECT object_id FROM wp_term_relationships WHERE term_taxonomy_id IN
( SELECT term_taxonomy_id FROM wp_term_taxonomy WHERE term_id IN
( SELECT term_id FROM wp_terms WHERE term_id = term_id_b )
)
)
AND id IN
( SELECT object_id FROM wp_term_relationships WHERE term_taxonomy_id IN
( SELECT term_taxonomy_id FROM wp_term_taxonomy WHERE term_id =
( SELECT term_id FROM wp_terms WHERE name = "Manuscript" )
AND taxonomy = "pub_type" )
)
AND post_status = "publish"
) as manuscript_count
FROM wp_terms
WHERE term_id IN
( SELECT term_id FROM wp_term_taxonomy WHERE taxonomy = "'.$taxonomy.'" )
ORDER BY name ASC
While this query does work, it runs extremely slow… as in between 3-5 minutes depending on server load. It’s so bad that, in order to maintain site performance, I’ve had to cache the query results to a JSON text file and only let the query run every 2 hours.
I know that the main problem here is that I’ve used sub-queries for everything. While I am trying to learn more about working with joins, I don’t yet know enough to write this query any other way.
Can anyone offer some insights or suggestions as to how I can tame this beast?
edit: Here is a screenshot of the EXPLAIN output from the query:
See this great post about the use of
IN
and how it can negatively affect query speed. This has your problem written all over it.Essentially, with a traditional
IN(values)
query, you are simply searching for each value. In a subqueryIN
as you have (IN(SELECT)
).In it, the author quotes from MySQL’s manual:
Using this logic, you have
wp_posts
at as an outer query, and within it nestedwp_terms_relationship
at 10 rows, and within that nested,wp_terms
again. That one is949*(10*1))
. The second time around you have only two levels of nesting, but because the final level only has one row, it is still the same impact: You are ending up parsing through 9490 rows.Subqueries are not your friend. While sometimes they may be necessary, they can almost always be solved using
JOIN
. This does not seem like it is much different based on your objectives. By rewriting this query and stretching to attempt to useJOIN...ON
, you will find that after a brief learning curve it will be much easier to read your own code, follow its logical progression, and, most likely, see it speed up too. At the very least, utilizejoin
s where you can. You will thank yourself later.