Avalon is successfully helping a number of our clients derive business benefit from Hadoop. And in that process, we see a very common problem: many of the great developers and architects we encounter just don’t know where to start in terms of getting that base level of technical knowledge in Hadoop. And they’re too busy doing their real job to try and figure out that path on their own. Sound familiar?
So …. we thought we’d put together this “starter kit” to frame a productive self-study path for those of you who are busy developers but eager to get started with Hadoop. Enjoy!
Books get out of date pretty quickly, however, many of our engineers have read and recommend “Hadoop: The Definitive Guide” as a good starting resource.
Introductory Online References:
This reference from Yahoo is good coverage of all the concepts (although it does not follow the new Hadoop API).
This reference is straight from the Apache Hadoop distribution and is a good introduction to the initial concepts.
Hadoop Distributions (there are many to choose from):
Our main recommendation is to just download and try Hadoop. That is how our engineers learned most of what we know today.
starting with Hortonworks….
….For three reasons:
- all of their software is open source (not open core plus proprietary like most);
- they have the largest number of contributors to the Apache Hadoop project;
- all their software is running at Yahoo which includes individual clusters of 4000 nodes and a total installment in the 10’s of thousands.
Again, download and try Hadoop. The Hortonworks Sandbox is a great way for you to get started.
And last but not least, resources to help you take it to the next level:
A good primer on selecting servers for your Hadoop cluster.
This article goes into more detail about the different node types and
what should be selected.
And finally, another useful reference on clusters.
Start with these resources, and you’ll be well on your way to MapReducing with the best of them!