StampedeCon 2012 Presentation Archive

///StampedeCon_2012 featured presentations that…

  • Provided specific case studies for collecting, storing and analyzing Big Data assets along with various industries’ views on what they see as “Big Data.”
  • Discussed some of the challenges companies face when handling Big Data.
  • Focused on the “how” of Big Data collection and analysis with operational and development-focused talks.

///StampedeCon_2012 was a well-rounded event designed to inform and equip organizations thinking about or already working with Big Data.  See the Agenda for the conference schedule and the Speakers page for details on our speakers.

Presentations at ///StampedeCon_2012

David StromHow Big Data Can Help Your Business: Case Studies from ReadWriteWeb VIEW SLIDES
David Strom (
Pulling from’s coverage of Big Data technologies in the Enterprise, we’ll see examples of how FedEx, Associated Press and others are using Big Data to drive their decisions.


Pritam DamaniaHBase Backups – VIEW SLIDES
Pritam Damania (Facebook)
Reliable backup and recovery is one of the main requirements for any enterprise grade application. HBase has been very well embraced by enterprises needing random, real-time read/write access with huge volumes of data and ease of scalability. As such, they are looking for backup solutions that are reliable, easy to use, and can co-exist with existing infrastructure. HBase comes with several backup options but there is a clear need to improve the native export mechanisms. This talk will cover various options that are available out of the box, their drawbacks and what various companies are doing to make backup and recovery efficient. In particular it will cover what Facebook has done to improve performance of backup and recovery process with minimal impact to production cluster.

Bill EldredgeMaking your Analytics Investment Pay OffVIEW SLIDES
Bill Eldredge (Nokia)
At Nokia, we expect to save millions on avoided license fees this year on a single “Big Data” project by creating a symbiotic relationship between our traditional RDBMS storage and our newer Hadoop cluster. Our hybrid approach to data enables us to manage the convergence of structured and unstructured data, and save money. In our case we use Hadoop to process and import data into traditional systems. We have found that this use of Hadoop as a preprocessing engine has enabled maximum value to be derived from our systems, our data and our people.


Alex MillerBig Data with SemanticsVIEW SLIDES
Alex Miller (Revelytix)
Many big data use cases involve moving many data sources into Hadoop where the data can be merged, summarized, and transformed. However, due to the volume and variety of data being poured into Hadoop, we need better tools for describing and connecting the data outside Hadoop, the data inside Hadoop, and the transformations between a variety of domains.

Semantic web standards like RDF (Resource Description Framework) and the SPARQL query language provide flexible tools for describing and querying virtually any kind of data or metadata. Traditionally these tools are used with RDF “triple stores”, however we can also apply these technologies to describing the data inside and outside Hadoop. These technologies can be used to load data into Hadoop, transform it while it’s there, query it, and export it, all in terms defined by the business and the data owners.

This talk will demonstrate how RDF can be used to describe a variety of data and metadata, how data stored in Hadoop can be transformed or virtualized as an RDF graph, and how queries and transformations can be defined by SPARQL and R2RML (the RDB to RDF Mapping Language).


Frank CotignolaListening for Insights: The Power of Social Media ListeningVIEW SLIDES
Frank Cotignola (Kraft Foods)
Social media “listening research” has emerged as a powerful alternative to more traditional, “asking research.”  through a number of examples, you’ll find out how to research important brand topics, provide in-depth insights to new product development, segment analysis and broader topics that you might not previously have had the funds to research.  Using a mixture of “paid” and “unpaid” tools, you’ll learn how to use this unique method for your important research questions.


Rob PeglarBig Data and the Analysis Conundrum – Challenges and Opportunities (slides not available)
Rob Peglar (EMC/Isilon)
This talk will cover several current topics in big data and specific analytic use cases in financial services and healthcare. The use of Hadoop and associated toolsets, along with optimal HDFS architecture for analysis problems at scale, will be discussed and best practices outlined.
Slides not available.


Erich Hochmuth

MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets
Erich Hochmuth (Monsanto)
Hadoop is quickly becoming the preferable platform for performing analysis over large datasets. We will explore opportunities for utilizing MapReduce to process genomic data in an enterprise system.

We will discuss how MapReduce is being used to scale existing data processing workflows and lessons learned migrating existing algorithms and workflows to MapReduce. Also we will touch on advanced capabilities of MapReduce such as composite keys, secondary sorting, and data serialization.


Scott FinesWelcome to the Jungle: Distributed Systems for Fun and Profit
Scott Fines (NISC)
Recent years have seen a sudden and rapid introduction of new technologies for distributing applications to essentially arbitrary levels. The growth in variety and depth of these different systems has grown to match, and it can be a challenge just to keep up. In this talk, I’ll discuss some of the more common systems such as Hadoop, HBase, and Cassandra, and some of the different scenarios and pitfalls of using them.  I’ll cover when MapReduce is powerful and helpful, and when it’s better to use a different approach.  Putting it all together, I’ll mention ZooKeeper, Flume, and some of the surrounding small projects that can help make a useable system.


Jim DueyA Survey of Probabilistic Data Structures
Jim Duey (Lonocloud)
Big data requires big resources which cost big money. But if you only need answers that are good enough, rather than precisely right, probabilistic data structures can be a way to get those answers with a fraction of the resources and cost. In this talk I’ll survey some different data structures, give some theory behind them and point out some use cases.