A recent entry by Tony Bain in his excellent 'Innovations in Data Management' blog caught me a little by surprise. In it he talks about the NoSQL movement - the group of people and organisations that say we can do without the RDBMS's from the likes of Oracle, Microsoft and IBM. This is new to me.
In a nutshell, the argument runs as follows: Massively scalable databases exist that are not relational and they power some of the biggest sites on the internet - Amazon, Google and Facebook to name three.
For those who don't know about NoSQL, take a look at ACM blogger Michael Stonebraker or a nice summary from Computerworld. You can't find anything about NoSQL on wikipedia yet - that's how new it is. Stonebraker constructs a convincing argument in favour of 'the death of the RDBMS':
- For data warehouses, a column store beats a row store by approximately a factor of 50 on typical business intelligence queries. The reason is because column stores read only the columns of interest to the query and not all of them. In addition, compression is more effective in a column store. Since the legacy systems are all row stores, they are vulnerable to competition from the newer column stores.
- For online transaction processing (OLTP), a lightweight main memory DBMS beats a row store by a factor of 50. Leveraging main memory and the fact that no DBMS application will send a message to a human user in the middle of a transaction, allows an OLTP DBMS to run transactions to completion with no resource contention or locking overhead.
- In XML, where the current major vendors have spent a great deal of energy extending their engines, it is claimed that specialized engines, such as Mark Logic or Tamino, run circles around the major vendors.
So what are the reasons for spending big dollars on an RDBMS? Here are the reasons I can think of for a large company:
- Nobody every got sacked for buying Oracle or DB2.
- The IT specialists have built their career and expertise on a specific vendor's product line. What's in it for them to support a change that they see as undermining that?
- Who's going to hire someone with Voldemort or MongoDB experience??
- For most applications, the RDBMS can do the job - so what if the company spends $500,000 more on hardware to do it. Don't forget, the hardware guys are also comfortable with running the big RDBMS on 'their' boxes.
- A surprising number of data warehouse developer's lack the skills to really understand the differences pointed out by the NoSQL people. Besides - they're not the one's paying for the infrastructure they use.
- When something goes wrong with the software - who do you call? Will they be around next year/month?
- "Our RDBMS solution works today - or it will when we upgrade/implement the new module." So why change?
I am happy to concede that most of these reasons are not technical. Politics is very real in larger enterprises and you don't stay long if you ignore this fact. In the past I have been lucky to hire some very good developers because of their frustration with life in a large corporate data shop.
Anyway, the NoSQL revolutionaries got together recently and you can read/view/listen to the presentations on Johan Oskarsson's site (he is a developer for Last.fm in London). The presentation on the Cassandra database by Avinash Lakshman of Facebook has some interesting stats comparing Cassandra and MySQL for example.
It is curious that the NoSQL people don't call their solutions databases. Instead they are a a "highly available key-value store" (Amazon) and a "distributed storage system for managing structured data" (Google). At least MongoDB does describe itself as "a high-performance, open source, schema-free document-oriented database." Not exactly as snappy a label as RDBMS and I wonder why CDBMS (Columnar RDBMS) isn't good enough.
Check out NoSQL - it could be useful to you.
맞습니다. Bigtable의 Google도 필요한곳엔 MySQL같은 RDBMS를 쓰고있고, Dynamo의 Amazon도 적절하다 싶은곳 (data가 eventually cotsesnint해도 괜찮은 곳이라던지)에만 쓰고있지요..NoSQL 솔루션들에서 제공하는 data model로는 모델링 하기가 너무 힘들지않은가 싶습니다. 물론 Relational DB에서 모델링 하는게 쉽다는말은 아니지만 그 동네에는 Normal form 이라는 데이타 모델의 건강함을 검증하는 방법이라도 있지요..결정적으로 국내 서비스 정도 규모의 load 수준은 RDBMS 솔루션들로도 충분히 처리할 수 있는 경우가 대부분이라 "왜 굳이 이걸 써야하죠?" 라는 운영팀의 당연한 의문이 더 설득력을 가지는 경우가 많다는거
Posted by: Juliana | Thursday, October 04, 2012 at 09:14 PM
Controversy is often rooted in a lack of fhiosrget regarding anything of public disclosure and consumption.Specialization + laziness + ego = argument where the the argument boils down to just because style statements. Just because statements play a role as place holders in the abstract where facts should arise this is where truth declines into opinion.Generalization + optimization = foundation is a three step principle that will enable any system with proven framework and workflow guidelines.Small highly specialized databases are intended for the purpose of a specialized and optimized application in context and really can't be compared to a general workflow.Something like Twitter or Facebook switching to other databases/dbms is obviously going to be persuaded by deep support. DB support will have to entertain images, text, linking and numerical operations for date-time-group management; etc.The debate is in fact strongly founded on need at a given point in time and this same issue has been evident in industrial manufacturing forever. The bigger better deal and the your's sucks worse than mine has been around for a long time.Specialized databases are always good until you outgrow them and it will happen.There are two concerns that are always overlooked that will save you an incredible amount of time and eliminate migraines. The first is how extensible is your database and how much does it already cover for you, and second is what is available for migrating your data from one dbms to another and can it import a dump. These two major points of will in fact save your bacon if you opt to lend them any kind of serious thought.Decent article stirred my mind a bit. Didn't realize I was getting rusty.
Posted by: Donibelle | Monday, May 07, 2012 at 11:42 AM
When I started reandig this book, I was an experienced developer with no idea about why NoSQL was causing so much excitement. The idea of a database without keys and tables was completely foreign to me. Needless to say, I expected this book to praise all the benefits of the NoSQL movement and never even mention any short commings or drawbacks that it might have. To be told to leave behind my rdbms systems for good. I was wrong While reandig this book, the authors went to great links to express not only the benefits of a NoSQL database and the benefits that MongoDB has over its competitors, but also the draw backs to using this type of database. What a relief! A product that can stand on its own, with authors and developers who have a goal and are not trying to make a system that does everything and does it for you automatically (include cook dinner) is so rare in software these days. This book is broken into 12 chapters, but in truth it is organized into three sections. Basics, Developing and Advanced. In the Basics section they cover the topics that are needed to bring you up to speed on MongoDB and NoSQL. If you only read one part of the book, I would recommend this section. It does such a good job of explaining things that you will no longer be the guy in the conversation looking lost and confused at the next conference. The developing section is great. It covers basic implementation into a PHP and Python project as well as helping you create a Blogging application. No Hello World for these guys. There are quite a few examples that allow you to actually understand, line by line, what is going on in the code. They have done such a good job explaining the examples that even if you do not know PHP or Python, you will be able to understand the code. The Advanced section starts hitting the high points of administration. I started getting a little lost here, but my background is heavily weighted towards development so that is not really suprising. In my opinion this book is not for the beginning developer and it is not for the experienced MongoDB developer/admin, but it is great starting poitn for the seasoned developer/administrator who would like to actually understand the NoSQL movement and get a basic understanding of MongoDB. The book is very short (less than 300 pages, which is short for a technical book) and provides many examples to help ease the learning curve to a manageable level. I would highly recommend this book to any developer who is either thinking about implementing a NoSQL solution, or is just tired of trying to figure out what everyone is talking about when they say NoSQL.
Posted by: Asna | Saturday, May 05, 2012 at 02:38 AM
"Nobody every got sacked for buying Oracle or DB2"
Perhaps they should. Megavendors are expensive and often supply a feature set well above requirements. MySQL serves a large number of businesses very well (I have run a moderately complex web game using it for about 5 years with no trouble from the DB engine). SQL Server is vastly cheaper that Oracle and is not that far off it in feature set. For a small - mid size data warehouse both those systems will do just fine, Oracle / DB2 are overkill.
But, back to the point of the article - it doesn't surprise me that an alternative to RDBMS is starting to emerge. Whether this is it is another question, but the RDBMS is getting pretty old now, and was well adapted to an environment where (relatively) small amounts of well structured data existed. Now data volumes are getting larger and less structured, a new approach is required and i'll be interested to see what comes next. Thanks for keeping us updated on developments.
Posted by: James B | Monday, July 06, 2009 at 12:15 PM
Good points Tony. For me, and my more modest data needs, the RDBMS does me fine. The main factor in my decision for database selection is usually the licensing cost - at least when it is my own online venture. MySQL has served me well so far - as it has millions of others.
With my other hat on - corporate manager in the big-end of town - the decision is different. In the recent past the primary data warehouses I have had built used Oracle and DB2. This was for many of the reasons I listed in my blog but ultimately I didn't care that much so long as the DB of choice supported the business activities I required of it. All the major DBs did.
My decisions were made easier by the discounting arrangements of the vendors. For their global clients the vendors will negotiate massive (80% or more) discounts when they sign global enterprise agreements. So I 'go with the flow' regarding the DB technology choice. I focus my efforts on delivering analytics that answer business questions.
As a footnote - I have found that the presentation layer is where the biggest performance issues are found in my 'analytic' world. If I need to get interactive data mining or discovery capabilities to end users, then it is the performance of the online report or dashboard that kills my infrastructure - and it is the fragility of the presentation layer's technology not the underlying database that is the root cause.
I suspect that my next BI platform will use one of the newer (and much cheaper) presentation layer products rather than sticking with the mega-vendor product again.
Posted by: OzAnalytics | Sunday, July 05, 2009 at 09:28 AM
There is a lot of debate going on. Personally I don’t believe in the death of the RDBMS, in fact my position is far from this.
But I do believe we went too far down the “RDBMS for everything mentality” this decade (which largely translated to “Oracle, SQL Server or MySQL for everything”). Instead I think we need a richer set of alternatives for our data management layer. Relational databases clearly are not the best fit for a number of requirements, yet we have been shoehorning them into anything & everything. The “NoSQL” distributed key/value stores have clear benefits for some needs, but RDBM’s are still most suitable for most applications.
However the RDBMS isn’t without fault. There are some significant problems that are compounding as both data volumes & processing response times grow. Now there is a bunch of interesting start ups focused on keeping the essence of the RDBMS but being innovative and fixing some of these problems, which will help to take it through the next decade.
Posted by: Tony Bain | Sunday, July 05, 2009 at 06:55 AM