Getting Started with Giraph

Screen Shot 2015-01-29 at 11.20.07 AM

Apache Hadoop’s core analytical tools (e.g. MapReduce, Hive, Pig) are great for performing batch analytics over large, unstructured data sets.  However, a myriad of data sets are comprised of a more graph-like structure. Examples of such data sets include: a map with cities connected by roads, a social network with people connected by relationships, airports connected […]

Hadoop Ecosystem Cheat Sheet

HDP 2.2 Components

For someone evaluating Hadoop, the considerably large list of components in the Hadoop ecosystem can be overwhelming.  Below you’ll find a reference table with keywords you may have heard in discussions concerning Hadoop as well as a brief description. Image courtesy of Hortonworks. Name Description HDFS Hadoop’s underlying distributed file system YARN Provides resource management […]

Harnessing Hadoop to Visualize Big Data Log Files

Screenshot_3_19_13_11_59_PM-3

One of the demonstration projects that had been on the radar for the Big Data Practice here at Avalon Consulting, LLC was processing some large log data and visualizing it in an interesting way.  A few days before the pope was going to be selected we got the idea that the traffic patterns at Patheos […]