Handling large N data in WordPress

I realize that this also pertains to PHP and MySQL in general, but the context and operation is within WordPress.

Introduction

I am currently developing a plugin that creates and saves data about exercises: It essentially lets me create workouts (select exercises and details about them), which are stored either as meta-data for regular posts or as drafts within a custom post type, and also create data about how the workout turned out (completion/details). The data is enumerated from the form into something like this (example set of data from a single workout):

Read More
Array
(
    [compound_exercises-incline_and_standing_shrugs-sets] => 8
    [compound_exercises-incline_and_standing_shrugs-reps] => 2
    [compound_exercises-stiff-legged_deadlift-sets] => 8
    [compound_exercises-stiff-legged_deadlift-reps] => 1
)
Array
(
    [compound_exercises-incline_and_standing_shrugs-sets] => on
    [compound_exercises-incline_and_standing_shrugs-sets-sets] => 8
    [compound_exercises-incline_and_standing_shrugs-sets-reps] => 8
    [compound_exercises-incline_and_standing_shrugs-sets-weight] => 16
    [compound_exercises-incline_and_standing_shrugs-sets-completed] => 1
    [compound_exercises-incline_and_standing_shrugs-sets-result] => 1
    [compound_exercises-incline_and_standing_shrugs-sets-info] => Lorem ipsum dolor sit amet...
    [compound_exercises-incline_and_standing_shrugs-reps] => on
    [compound_exercises-incline_and_standing_shrugs-reps-sets] => 5
    [compound_exercises-incline_and_standing_shrugs-reps-reps] => 10
    [compound_exercises-incline_and_standing_shrugs-reps-weight] => 69
    [compound_exercises-incline_and_standing_shrugs-reps-completed] => 1
    [compound_exercises-incline_and_standing_shrugs-reps-result] => 4
    [compound_exercises-incline_and_standing_shrugs-reps-info] => Lorem ipsum dolor sit amet...
    [compound_exercises-stiff-legged_deadlift-sets] => on
    [compound_exercises-stiff-legged_deadlift-sets-sets] => 3
    [compound_exercises-stiff-legged_deadlift-sets-reps] => 8
    [compound_exercises-stiff-legged_deadlift-sets-weight] => 17
    [compound_exercises-stiff-legged_deadlift-sets-completed] => 1
    [compound_exercises-stiff-legged_deadlift-sets-result] => 1
    [compound_exercises-stiff-legged_deadlift-sets-info] => Lorem ipsum dolor sit amet...
    [compound_exercises-stiff-legged_deadlift-reps] => on
    [compound_exercises-stiff-legged_deadlift-reps-sets] => 3
    [compound_exercises-stiff-legged_deadlift-reps-reps] => 16
    [compound_exercises-stiff-legged_deadlift-reps-weight] => 100
    [compound_exercises-stiff-legged_deadlift-reps-completed] => 1
    [compound_exercises-stiff-legged_deadlift-reps-result] => 3
    [compound_exercises-stiff-legged_deadlift-reps-info] => Lorem ipsum dolor sit amet...
    [general-data-weight] => 60
    [general-data-chest] => 10
    [general-data-waist] => 10
    [general-data-thighs] => 10
    [general-data-calves] => 10
    [general-data-arms] => 10
    [general-data-forearms] => 10
)

The first array contains the workout, and planned amount of sets and repetitions; and the second the second details about how the workout went. The first thing to notice here is that just one workout could be several KBs in size. It would not be uncommon, for example for a few years amount of workouts, to reach several MBs in size when aggregated.

Also, I am aware of software that solves these tasks easily independently of WordPress, but I am trying to integrate this solution into WordPress because it famously can handle large amounts of data and easily integrate with blogging and other solutions.

Problems

Though caching partial-data from several MBs of MySQL results is not necessarily hard, the problem is in retrieving it and handling it incrementally and asynchronously: I need WordPress to update a set of transients by appending data without disturbing user-interaction. This would entail handling some preset amount of data to create a ready-to-display (for example as a table or a graph) set of this data.

My main concern is the load to the WordPress database and potentially to the server its hosted on, and so I want to make sure the data retrieval and handling is as optimal as possible both when creating, saving and displaying the data.

Questions

  1. What would impact the database load least: Storing data as posts with post meta in the custom post type, as ambiguous post meta, as hierarchical taxonomies with metadata (by plugin, ref: q49333), as data in a custom database table, or some alternative option?
  2. Are cronjobs, direct-action or ajax better for appending newly added data to a transient containing previous data (for example a months worth)?
  3. Assuming that the aforementioned workout and results array are necessary, which would save more space, serializing (as WordPress does by default), json encoding, or another alternative?
  4. Upon retrieving a large amount (for example a months worth) of data, what method would be less system intensive for aggregating it? Assuming it has to be systematized into comparable units, can this be done client-side to lessen the load, or would a pre-generated transient (with ready-to-display data) be better?
  5. In terms of exporting/importing data, what pre-existing system would best handle this amount of data? IE., does WordPress have a built-in CSV export/import? I am assuming that the regular Import/Export tool won’t handle a large data load.

Related posts

Leave a Reply

1 comment

    1. Custom table(s). WordPress posts are not structurally a fit for what you are doing. Trying to use them for this will result in a lot of unnecessary data processing and storage. But you need to think through your own database design and you need to be good with SQL or you will end up inefficient anyway.

    2. I am not convinced transients will help you here. It sounds like you are going to altering data a lot. That may negate the benefits.

    3. Serialize if you must store an array. JSON might be smaller but serialize has more data– a kind of simple checksum value for one. Size shouldn’t matter. The option_value column of wp_options is longtext. That is about 4GB. The limiting factor is going to be system memory, not the column.

    4. Client side processing (Javascript) is many, many times slower than server side (PHP) not to mention the data transfer time. If you have prepared data in a transient that might help, but as before it sounds like you are going to be altering data a lot.

    5. No idea what this means.