If my wordpress site generates thousands(perhaps millions) of posts a day, what is the best way to keep the site from taking a performance hit with posts that only need to be seen if someone searches old posts or for legal purposes?
My first thought was to run a cron job during a lull and move the out-of-date posts to an archive database. If anyone wanted to view an older post, the code would automatically look in the archives.
Is there a better way?
Also, any links to tuts on handling massive site data would be helpful!
You are not alone with this, see the Coding Horror post on the same topic: http://www.codinghorror.com/blog/2008/04/behold-wordpress-destroyer-of-cpus.html
The easiest thing to do is to install the WP-Cache plugin:
WP-Cache is an extremely efficient WordPress page caching system to make your site much faster and responsive. It works by caching Worpress pages and storing them in a static file for serving future requests directly from the file rather than loading and compiling the whole PHP code and then building the page from the database.
http://mnm.uib.es/gallir/wp-cache-2/
Perhaps use these recommended DB settings:
http://www.codinghorror.com/blog/files/matt-mullenweg-wordpress-mysql-recommendations.txt
Then perhaps make a few changes to your query cache settings in MySQL.
To see how to do this: http://dev.mysql.com/doc/refman/5.0/en/query-cache-configuration.html