Sybase iAnywhere SQL AAnywhere Mobile and Embedded Database

I'd rather play golf

Thoughts on data management, autonomic computing, and self-managing database systems.

header image

Hadoop vs. relational databases

November 21st, 2008 · 2 Comments

Apache’s Hadoop is getting a lot of traction amongst academics and commercial data warehouse vendors, and it may come to pass — relatively quickly — that aspects of Hadoop’s parallel-processing capabilities will be integrated with traditional relational database systems. That integration may occur with the help of the integration of Yahoo!’s Pig.

In a recent blog post on the O’Reilly blog site, Joe Hellerstein of UC Berkeley presents a brief overview of the differences between Hadoop and traditional relational database processing:

So where is all this headed? In the short term, the churn in the marketplace should drive a much faster pace of innovation than traditional database vendors provided over the last decade. The technical advantages of Hadoop are not intrinsically hard to replicate in a relational database engine; the main challenge will be to manage the expectations of database users when playing tricks like trading off data integrity for availability on certain subsets of the database. Greenplum and Aster will undoubtedly push to stay one step ahead of the bigger database companies, and it would not surprise me to see product announcements on this topic from the more established database vendors within the year.

Thanks to my colleague Bruce Hay for sending this my way.

Tags: Alternative query languages

2 responses so far ↓

  • 1 Anonymous // Dec 29, 2008 at 2:42 pm

    See CloudBase-

    It is a data warehouse system built on top of Hadoop’s Map Reduce architecture that allows one to query Terabyte and Petabyte of data using ANSI SQL. It comes with a JDBC driver so one can use third party BI tools, reporting frameworks to directly connect to CloudBase.

    CloudBase creates a database system directly on flat files and converts input ANSI SQL expressions into map-reduce programs for processing flat files. It has an optimized algorithm to handle Joins and plans to support table indexing in next release.

  • 2 frady mui // Jan 27, 2010 at 3:35 pm

    So how is hadoop better than mysql cluster?