Going for the one?

December 7th, 2007

The question is, would one database be superior to many? This is, as usual, more complicated that it at first seems.

From a practical point of view, the amount of data involved is no great shakes: for example, one might compare the amounts involved with, say, the transaction history of a large retail site. It would take three years or so to accumulate as much data as this company produces as the retail site produces in a day. At this level, almost any type of database is available, even those with supposed scalability issues such as MySQL.

So if size is not important, what is?

In the first instance, resilience may be a problem, especially in the case of a media organisation tied to publication deadlines. Imagine the whole thing crashing, with perhaps an estimated restoration time of 18 hours.

  • For online, such a delay may mean at least 18 hours of downtime, with associated loss of revenue from ad banners and click thrus as well as any online sales.
  • For print, workarounds could probably mean that copy was saved locally and added to the database at a later time; however, the exact timing of the crash would be important — the closer to deadline, the more damaging.

With a large single database, every catastrophic outage would hit all teams. In this case some fall back position would be a necessity, for example real-time mirroring of the database. Yet the amount of data means than there would be little noticeable affect on performance.

An alternative would be a single large database with local repositories. In this case in the event of a catastrophic failure, teams could carry on using data stored in the repositories, with the main database being updated when it comes back online.

Another alternative is to use many smaller linked databases with front end software carrying out necessary housekeeping to ensure co-ordination. With a multiple database option it may be harder maintain integrity: with many smaller databases comes the opportunity for users to add their own tables (possibly complicating the situation). A multiple database solution would ideally require more complex policing.

The main overhead in all these scenarios is that the amount of data involved, which is not great. This should be an encouragement to mirror any datbase, no matter how big it might be to ensure continuity of supply in the event of calamity: this might also mean mirroring the databases at a remote venue to ensure complete security.



Leave a Reply