Apache Hadoop is commonly used as the core of massive data pipelines.  Due to it’s popularity, and strong community of contributors, the ecosystem of related software has grown to include as many as 140* projects. While having such a wide range of tools can be convenient, the sheer volume of options can also be very overwhelming. 
 
To address the size of the Apache Hadoop software ecosystem this session will walk attendees through examples of many of the tools that Rich uses when solving common data pipeline needs.  Rich will discuss the use cases that typify each tool, and mention alternative tools that could be used to accomplish the same task.  Examples will include Java MapReduce, Hive, Pig, Spark, HBaseSqoop, and Flume.