Archive for the 'Databases' Category

Going for the one?

Friday, December 7th, 2007

The question is, would one database be superior to many? This is, as usual, more complicated that it at first seems.

From a practical point of view, the amount of data involved is no great shakes: for example, one might compare the amounts involved with, say, the transaction history of a large retail site. It would take three years or so to accumulate as much data as this company produces as the retail site produces in a day. At this level, almost any type of database is available, even those with supposed scalability issues such as MySQL.

So if size is not important, what is?

In the first instance, resilience may be a problem, especially in the case of a media organisation tied to publication deadlines. Imagine the whole thing crashing, with perhaps an estimated restoration time of 18 hours.

  • For online, such a delay may mean at least 18 hours of downtime, with associated loss of revenue from ad banners and click thrus as well as any online sales.
  • For print, workarounds could probably mean that copy was saved locally and added to the database at a later time; however, the exact timing of the crash would be important — the closer to deadline, the more damaging.

With a large single database, every catastrophic outage would hit all teams. In this case some fall back position would be a necessity, for example real-time mirroring of the database. Yet the amount of data means than there would be little noticeable affect on performance.

An alternative would be a single large database with local repositories. In this case in the event of a catastrophic failure, teams could carry on using data stored in the repositories, with the main database being updated when it comes back online.

Another alternative is to use many smaller linked databases with front end software carrying out necessary housekeeping to ensure co-ordination. With a multiple database option it may be harder maintain integrity: with many smaller databases comes the opportunity for users to add their own tables (possibly complicating the situation). A multiple database solution would ideally require more complex policing.

The main overhead in all these scenarios is that the amount of data involved, which is not great. This should be an encouragement to mirror any datbase, no matter how big it might be to ensure continuity of supply in the event of calamity: this might also mean mirroring the databases at a remote venue to ensure complete security.

All your databases are belong to us

Thursday, December 6th, 2007

My latest project is to analyse the content database strategy of a major multimedia publishing company.

Like many media companies around today, its business model has changed dramatically to take on board different methods of broadcasting its output. And like many media companies around today, that business model has grown organically in an almost haphazard way, finding short-term fixes to meet the challenge of the moment.

This is not a question of cutting corners; much expensive work has been undertaken. But the bottom line is that media companies seldom have the luxury of stepping back from the everyday grind to properly assess where they are right now, let alone how they should progress from here.

That’s where I come in. As someone involved in content supply and manipulation for the best part of 20 years, I am a fresh pair of eyes. Nevertheless, the headaches have started to kick in around 11.30am each day, as I try to unpick the problems.

Simply put, they have added to their portfolio of databases as the years have gone by: from the weekly publication of a magazine, to the daily output of a web site and now the regular production of books, and all with the aim of running a joined-up operation, both online and off.

They now have three separate content databases, each with its own shelflife and tell-by dates; each with its peculiar naming conventions, and each with its needs and opportunities.

Is it possible to get all three databases talking the same language? It should be. After all databases are simple structured collections of information, manipulated by mathematical rules and logical expressions. Actually, it turns out in this case that what most of the protagonists really mean when they talk about a database is actually a Content Management System. It’s a forgiveable slip; after all, a CMS is simply the front end of a database. The complication is having THREE Content Management Systems feeding into three vectors of transmission — to the web AND print.

Right now, my first task is simply to describe this on paper: call it a springboard to a place where I can begin to formulate possibilities. What follows over the next six weeks is anyone’s guess. At least I’ve got a good supply of headache pills.