Database Design suggestion for a Blog farm

I’m building something similar to a Blogfarm and I came across a small hurdle. I used the WordPress Multi-User database as a reference and noticed that a unique table is created for each blog that is being created.

So if the farm is going to have, say, ‘x’ million users (just a weird thought), then the database would ideally have ‘x’ million tables, assuming a user for each blog.

Read More
  1. Is the one used by WordPress MU, a good db design? If yes, how much impact would it have on the database performance with so many ‘x’ million tables?

  2. Since I’m just starting to code, I have the liberty to choose any database I like. Currently I’m using PostgreSQL combined with Ruby on Rails. Do you think a NoSQL database (like MongodB) would be of any use in this situation? If not, why/why not? I haven’t seen any blog platform run on a NoSQL db yet.

  3. How do the big guys like Blogger, Tumblr or Squarespace do it?

Any help is much appreciated, Thank You.

References:

http://www.aeonscope.net/2006/12/29/wordpress-migration/

Related posts

Leave a Reply

1 comment

    1. It’s quite likely good enough. My guess is that it has mostly caching issues. If you have thousands of blogs on the same server and each is equally likely to be hit, then caching is going to be a nightmare, and probably most queries will need to hit the hard drive (cold cache hits). However, if most queries hit the same blogs, and hence the same tables, caching will be good enough.

    2. My honest advice is the following. Do the simplest thing and forget about scalability problems for now. 99.999% of web sites do not do enough traffic to warrant any specific problems, and the 0.001% of sites that do, will have the resources to actually rewrite any code base so it scales. In this context, use the following rule of thumb:

      • RBDMS are generally good enough for most uses. Furthermore they are extremely stable and well supported. RBDMS have been around for 40 years. 🙂
      • NoSQL databases have a few, very specific uses where they shine (e.g. document storage, billion row tables, huge scalability constraints), on the other hand they have gotchas (see BASE vs ACID) and they do represent an increased risk as they are newer, less understood concepts.
    3. I guess they do it through some form of sharding. In other words, yes you would have millions of tables split across thousand of servers.

    The main point here is that this architectural choice is a trade off. If you have sites with large traffic, you will need more db servers and this allows you to scale horizontally. On the other hand, if you have a lot of site with little traffic, you probably don’t need to scale that much.