Friday, February 26, 2010

Edgar Frank

Why are relational databases so ubiquitous? We are taught about them in Computer Science lessons at university; we see job advertisements requiring SQL skills all the time; almost every project I've touched in my professional career has used a DBMS of some sort, to various degrees of efficacy.

I will admit that to some extent my exposure is biased and self-fulfilling: I took the course, someone heard I was interested in developing the skills, I started looking to reuse and improve my skills, now every project wants to leverage existing skills and a subset of those would hire people that knew no other language but SQL.

But why? Who is designing all these applications? Are they really considering the actual requirements, or are they following the approach prescribed by DBMS vendors: use a relational database. Heck, use our DBMS.

Most database systems worth their salt offer two primary features: a reliable transactional system and a way to protect data integrity; ACID and EI/RI/DI.

Even if we don't use explicit transactions, or even consider them in our designs, the ACID properties (atomic, consistent, isolated, durable) stop simultaneous updates from rendering our entities in an inconsistent state.

Entity, referential and data integrity are enforced by the DBMS, but we need to design the structure of the database to take advantage of them. We declare entity integrity by defining primary keys (no two rows are duplicates), referential integrity by defining foreign keys (no unresolved dependencies) and data integrity by defining column types such as int and datetime.

SQL Server also offers us the ability to store massive amounts of data, indexes to seek right into the bits of data that you need, and a powerful query optimiser that will join related bits of data in the most efficient way possible. It also allows the data to be restored to arbitrary points in time in the past. With failover clustering it even attempts resilience against hardware failures. But despite its prevalence, I know there necessarily exists something (in concept at least) more efficient for every given task.

My perceived problem in DBMS systems lies in their applicability to general problems - in short their ubiquity. People don't question whether or not they need a DBMS; instead they question which vendor to purchase them from. Understanding everything a database does isn't something everybody wants to do, but more and more people are available in the market to build your company's new idea into a tangible product. A recruitment agent has a much easier job selling a labelled skill than selling some undefined/unquantifiable genius. A man (or woman) can make a living by writing SQL code for another man (or woman). It is everywhere. But why re-invent the wheel?

Well, when you don't know how large your application will have to scale up to at design time, you can take the bet that it won't scale, you can save some time and money by hiring a SQL developer off the street, and you will probably deliver something that's functionally correct if you do it right. But down the line, usually when you become a victim of your own success, it will become the bottleneck, and seemingly impossible to extract from the code that's been written around it...


Jono said...

The key, the whole key, and nothing but the key, so help me Codd.

Jono said...

Actually, the competitive advantage of being able to deliver something quickly is SQL's biggest selling point, although many will laugh at the idea of using SQL as a prototype before moving onto a bigger more specialized idea.