I’m designing a WordPress application that has ambitions of having huge volumes of users and as a result while I accept refactoring as a reasonable future activity I also need to be at least aware and behave appropriately based on performance and scalability considerations now.
My concern — to which I’m hoping some you folks will have an opinion — is regarding the performance characteristics of Custom Post Types (CPT) and in particular Custom Fields (CF) attached to CPTs. In my particular case I was considering using a lot of CPT’s for both content and transactional data. For the transactional data — which is highly structured — I’m using a lot of CF’s. I’m not starting to take the view — rightly or wrongly — that:
-
For content related entities and “reference data” ⦠blog entries, topic definitions, articles, companies information, user profile, products, etc. I think that the WP CPT/CF data model is adequate and the SQL performance should be ok even if some queries are a bit long in the tooth.
-
For transactional data â which in my case might represent 5-10 “transactions” per day per user, which in turn would translate into maybe 50-100 INSERTS into the DB (each CF is an insert) â the data growth will quickly make the query performance unattractive/inappropriate (more concerned with the SELECTs than the INSERTs here). Keep in mind I’m aiming for a user base in the 100’s of thousands although I suspect even a user base in the 1,000’s would start to feel the pain.
To help illustrate this, let me use a “cholesterol test” as an example. I’m simplifying the data requirement a bit but let’s assume you wanted to capture Total Cholesterol, HDL, LDL, and Triglycerides. Then you wanted to present back to users their history of tests. Your query would look something like:
SELECT wp_posts.id, wp_posts.post_date
, MAX(CASE WHEN meta_key = 'wpcf-which-day' THEN meta_value END) AS which_day
, MAX(CASE WHEN meta_key = 'wpcf-biochem-lipids-total-cholestertol' THEN meta_value END) AS total_cholesterol
, MAX(CASE WHEN meta_key = 'wpcf-biochem-lipids-ldl' THEN meta_value END) AS LDL
, MAX(CASE WHEN meta_key = 'wpcf-biochem-lipids-hdl' THEN meta_value END) AS HDL
, MAX(CASE WHEN meta_key = 'wpcf-biochem-lipids-triglycerides' THEN meta_value END) AS Tri
FROM wp_posts LEFT JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id )
WHERE (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private')
AND post_type = "measurements"
GROUP BY wp_posts.ID
HAVING MAX(CASE WHEN meta_key = 'wpcf-measurement-type' THEN meta_value END) = '4'
ORDER BY MAX(CASE WHEN meta_key = 'wpcf-which-day' THEN meta_value END) DESC
Pretty messy for something that is a simple requirement, right? Anyway, let me qualify the fact that I’m not a WP, mySQL, or even DB expert so I don’t want to be unfair but rather just solicit some feedback. Am I being fair? Would you draw the line differently? Any help would be greatly appreciated.
I didn’t get an answer from the community so let me post my quick answer (which I’m still happy to hear other opinions on).
In the overall battle between performance and flexibility I think CF’s have chosen an appropriate balance but one that favours flexibility which means the areas of the application that have huge volumes of data may outgrow it’s utility at some point. I still don’t have any rules of thumb for where that line should be drawn.