I’m in the process of setting up a new WordPress 3.0 multisite instance and would like to use Sphinx on the database server to power search for the primary website. Ideally, this primary site would offer the ability to search against its content (posts, pages, comments, member profiles, activity updates, etc.) as well as all of the other sites that are a part of the network. Because we’ll be adding new sites to the network on a regular basis, I’d like to be able to dynamically add those newly generated tables to the Sphinx .conf file (instead of editing the file and reindexing every time we add a new site).
Unfortunately, MySQL doesn’t seem to support wildcards when specifying the table(s) in a query string. The best solution I’ve come across for grabbing a dynamic set of tables is grepping but I’m pretty certain I don’t know how to do this within the .conf file (unless it’s possible through magical sorcery).
Is it possible to dynamically specify tables to add to the Sphinx index? Or is this going to cause such performance issues that I’m using the wrong tool?
You could try to dynamically modify the .conf file instead.
You could query from a MySQL view that aggregates the many tables. You’d have to recreate the view with each change to the list of blogs, but I believe that all the hooks exist to support that and it should be easy enough to construct the view query.
The bigger problem may be in trying to find a suitable unique record ID for the posts in Sphinx. It has to be a straight INT, but the post IDs from the different blogs will collide with each other.
I think you can create triggers (INSERT/UPDATE/DELETE) in MySQL on the interested tables (e.g. posts, comments etc) and migrate the data to centralized global tables that are indexed by Sphinx in real time.
The point is how you can create those triggers automatically? Either you can run a cron job to scan for new tables in MySQL, or I believe you can write a simple WordPress plugin that hook when a blog is activated.