A recent entry by Tony Bain in his excellent 'Innovations in Data Management' blog caught me a little by surprise. In it he talks about the NoSQL movement - the group of people and organisations that say we can do without the RDBMS's from the likes of Oracle, Microsoft and IBM. This is new to me.
- For data warehouses, a column store beats a row store by approximately a factor of 50 on typical business intelligence queries. The reason is because column stores read only the columns of interest to the query and not all of them. In addition, compression is more effective in a column store. Since the legacy systems are all row stores, they are vulnerable to competition from the newer column stores.
- For online transaction processing (OLTP), a lightweight main memory DBMS beats a row store by a factor of 50. Leveraging main memory and the fact that no DBMS application will send a message to a human user in the middle of a transaction, allows an OLTP DBMS to run transactions to completion with no resource contention or locking overhead.
- In XML, where the current major vendors have spent a great deal of energy extending their engines, it is claimed that specialized engines, such as Mark Logic or Tamino, run circles around the major vendors.
- Nobody every got sacked for buying Oracle or DB2.
- The IT specialists have built their career and expertise on a specific vendor's product line. What's in it for them to support a change that they see as undermining that?
- Who's going to hire someone with Voldemort or MongoDB experience??
- For most applications, the RDBMS can do the job - so what if the company spends $500,000 more on hardware to do it. Don't forget, the hardware guys are also comfortable with running the big RDBMS on 'their' boxes.
- A surprising number of data warehouse developer's lack the skills to really understand the differences pointed out by the NoSQL people. Besides - they're not the one's paying for the infrastructure they use.
- When something goes wrong with the software - who do you call? Will they be around next year/month?
- "Our RDBMS solution works today - or it will when we upgrade/implement the new module." So why change?
"Nobody every got sacked for buying Oracle or DB2"
Perhaps they should. Megavendors are expensive and often supply a feature set well above requirements. MySQL serves a large number of businesses very well (I have run a moderately complex web game using it for about 5 years with no trouble from the DB engine). SQL Server is vastly cheaper that Oracle and is not that far off it in feature set. For a small - mid size data warehouse both those systems will do just fine, Oracle / DB2 are overkill.
But, back to the point of the article - it doesn't surprise me that an alternative to RDBMS is starting to emerge. Whether this is it is another question, but the RDBMS is getting pretty old now, and was well adapted to an environment where (relatively) small amounts of well structured data existed. Now data volumes are getting larger and less structured, a new approach is required and i'll be interested to see what comes next. Thanks for keeping us updated on developments.
Posted by: James B | Monday, July 06, 2009 at 12:15 PM
Good points Tony. For me, and my more modest data needs, the RDBMS does me fine. The main factor in my decision for database selection is usually the licensing cost - at least when it is my own online venture. MySQL has served me well so far - as it has millions of others.
With my other hat on - corporate manager in the big-end of town - the decision is different. In the recent past the primary data warehouses I have had built used Oracle and DB2. This was for many of the reasons I listed in my blog but ultimately I didn't care that much so long as the DB of choice supported the business activities I required of it. All the major DBs did.
My decisions were made easier by the discounting arrangements of the vendors. For their global clients the vendors will negotiate massive (80% or more) discounts when they sign global enterprise agreements. So I 'go with the flow' regarding the DB technology choice. I focus my efforts on delivering analytics that answer business questions.
As a footnote - I have found that the presentation layer is where the biggest performance issues are found in my 'analytic' world. If I need to get interactive data mining or discovery capabilities to end users, then it is the performance of the online report or dashboard that kills my infrastructure - and it is the fragility of the presentation layer's technology not the underlying database that is the root cause.
I suspect that my next BI platform will use one of the newer (and much cheaper) presentation layer products rather than sticking with the mega-vendor product again.
Posted by: OzAnalytics | Sunday, July 05, 2009 at 09:28 AM
There is a lot of debate going on. Personally I don’t believe in the death of the RDBMS, in fact my position is far from this.
But I do believe we went too far down the “RDBMS for everything mentality” this decade (which largely translated to “Oracle, SQL Server or MySQL for everything”). Instead I think we need a richer set of alternatives for our data management layer. Relational databases clearly are not the best fit for a number of requirements, yet we have been shoehorning them into anything & everything. The “NoSQL” distributed key/value stores have clear benefits for some needs, but RDBM’s are still most suitable for most applications.
However the RDBMS isn’t without fault. There are some significant problems that are compounding as both data volumes & processing response times grow. Now there is a bunch of interesting start ups focused on keeping the essence of the RDBMS but being innovative and fixing some of these problems, which will help to take it through the next decade.
Posted by: Tony Bain | Sunday, July 05, 2009 at 06:55 AM