This is a generalization of a question that I’ve been struggling with for a while. My case is that I have a WordPress site with multiple “post types” (e.g. Articles, Blog Posts, Products, etc.). As is common practice these days, I want to display search results from each post type in separate categories.
The problem I have is in structuring the search. Should I run a separate database query for each post type, or should I run one big query and separate everything out via PHP? I tend to lean towards the latter, but the problem I’m running into is with pagination. I would probably have to not set any LIMIT
on the query because if I had several matches from one post type, the search would not return any results from the other post types.
So, from a performance and general best practices stand point, is it better to have one big query without a LIMIT
clause, or to run several queries for each search?
Note: This is similar to a question I asked on the WordPress Stack Exchange site a while back. I accepted the multiple query solution then, but I’m still pretty unsure about this.
In my experience, it is usually better to ask the database to do as little work as possible, and have PHP do most of the heavy lifting. It’s usually faster.
So, I would try doing two very simple queries (one for each table) and then merging/sorting them with PHP code.
If your data set is very big, or if your web host is crap, then your PHP script may run out of memory… then, and only then, it is a good idea to start hunting around for the right way to do it in MySQL (I suspect temporary tables might be the right place to look).
But if you run into PHP’s performance limits, then I suspect anything you do in MySQL is actually going to be even slower and you’ll have to change your database structure to get good performance. One way to do this is to keep your existing table structure, but have a third table that contains duplicate data from all the tables – just for searching, and some code to keep everything in sync.
For example, we have a table that contains every pdf document uploaded by the website’s users, and we have another table that contains every word that is in any document, and many-to-many linking table in between those.
Whenever a new pdf is uploaded, we find every word in it, and insert records into the linking table. This way we never actually have to search in the PDF documents, we only search the index tables which have been structured to allow for fast searching.
For searchs of this type I’d open my mind to indexing, using solutions like Solr. You can do a lot of things with such solutions and they match very well math to the type of users that search in a generic way.