How to make dynamically-generated content searchable in WordPress?

I’m writing a plugin for WordPress that dynamically generates content from a Google Documents Spreadsheet. Specifically, the plugin provides a shortcode that generates a good-looking staff list. It works, but there are two problems:

  1. Pages that use the shortcode load a little slow, because they have to make multiple requests to another server
  2. Since the content is generated when the page is loaded, the contents of the staff list do not show up in search results on the site

I can fix the first problem with some level of caching, but what can I do about the second problem? I mention the first problem because I think the ideal solution is one in which WordPress indexes the cached copy of the page.

Read More

Another solution I can think of would be to have the content of the page composed by a plugin periodically. That way the page could be searched and wouldn’t be dynamically generated every time. However, it seems like a strange paradigm for a plugin to completely control the content of a page…are there other plugins that do this? Also, this approach exposes complexity to the user that it shouldn’t have to. (They’d have to edit the contents of the page from a plugin page instead of the normal place.)

Here’s a sample of what part of the markup for the page looks like:

<h2>General Management</h2>

[staff-directory department="General Management"]

The spreadsheet to hit is configured separately. The “department” specifies the worksheet. (I’d show you what the results look like, but I don’t have enough rep to post an image.)

Your suggestions are greatly appreciated.

Related posts

Leave a Reply

2 comments

  1. The wp_posts table has a post_content_filtered column that plugins can use to cache expensive post content filters. The idea is that when you display the page, you don’t read post_content but you read post_content_filtered. This is nice, but it won’t solve your search problem because WordPress by default only looks at post_content and post_title.

    You can however do it the other way around: store the editor content in post_content_filtered and the rendered page in post_content (update it periodically with a cron job). There are filters that are called before the post is edited, you can use them to pass post_content_filtered instead of post_content to the editor. So the user will see no difference, but the performance and the search experience will be improved.

  2. I see two ways

    If speed is important, than caching would solve both of your problems. If you can somehow add a timestamp of last modification to the cache and the data retrieval url/get, with a simple hash algorythm you can be sure, that the cached version is up to date.

    If speed is not so relevant, you can use the API to ask for the spreadsheet. For example, you may not need PHP, instead you can use JavaScript and JSON for a very quick search, like the data retrieving in this example.

    In your place, I’d stick with caching the data, calculating if Google Docs is not available at the very moment of the search query.