Monday, July 05, 2010

A Guided Tour of the Hadoop Zoo: Welcome

It's a fine sunny Sunday, a perfect day to do something outdoors, like visit the zoo.  However, if you are lazy like I am, how about visiting the Hadoop Zoo.  From elephants to elephant bees to elephant birds, the Hadoop Zoo has got enough 'animals' to rival a zoo.

Let's start with the basic question.  Why would you want to visit the Hadoop Zoo?  The most common answer is that you have a lot (by lot, I mean giga/tera/peta bytes of data) what you want to store and do something useful with.  The various animals, and actually, lots of non-animals, help you do exactly this.

Let's start with the cute elephant, Hadoop, the central attraction of the zoo.  The Hadoop File System (HDFS) and the Hadoop MapReduce Framework are the core components of Hadoop.  HDFS stores gigantic amounts of data in a distributed, scalable and reliable fashion.  The Hadoop MapReduce framework helps you write Java (and some other languages) programs to efficiently process your data into valuable information.

That's it - HDFS and the MapReduce framework are all you need to store and process a huge amount of data.  And that's all you had a few years ago.  But now, there are more animals in the zoo, that make your visit more fun and your life easier.

This series of blog posts is an attempt to record and expand my understanding of the various components of the Hadoop ecosystem.  I am by no means a Hadoop expert.  So, if you find something wrong or missing, please do send me an email or add a comment.

1 comment:

Unknown said...

Business Intelligence Consultant
SQIAR (http://www.sqiar.com/services/bi-strategy/) is a leading global consultancy which provides innovative business intelligence services to small and medium size (SMEs) businesses.