Agenda

StampedeCon 2013 will be held July 30-31, 2013 in St. Louis, MO.  The conference is scheduled from 8:00 AM to 4:30 PM each day.  In addition, we have pre-conference and post-conference training available.  Stay tuned for the detailed schedule and a few additional speakers!  Registration is now open.

July 30: Developing Your Data and Analytics Strategies
Connect business goals and realities with big data architecture decisions.

Five Trends in Analytics . . . How to Take Advantage Today

John Lucker, Partner and Principal at Deloitte Consulting
Lucker will discuss the latest advancements in the world of analytics and offer strategies for tapping into their potential. The topic areas include visualization and design, mobile analytics and strategy analytics.

From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data

Paul Doherty, President and CEO of the digit group, inc.
As our world emerges as large, urban environments, built environment and IT professionals are challenged with integrating a building’s Digital DNA into the urban fabric of Smart City initiatives. This creates opportunities for Cloud­based and mobile analysis and management that can lead to better design, performance, service and sustainability. The knowledge behind the urban intelligence of Big Data latently resides with today’s built environment and IT professional. Join us for a discussion that will define Smart Cities, identify Smart Buildings and provide you with best practices, lessons learned and a framework strategy for your organization to profit from the Smart Cities movement.

Big Data, Big Law

Anthony Martin, Chief Privacy and Information Security Counsel – Walmart
This is the story of one global, multi­channel company’s walk through the increasingly complicated Legal, Compliance, Security maze while trying to recognize the implicit value of Big Data programs.

Big Data Analytics: Inside and Out

Michael Cavaretta, Ford Motor Company – Lead of Predictive Analytics group in Research and Advanced Engineering
Cavaretta will present three areas of opportunity for Big Data Analytics: improving internal processing, understanding customers though external data (including social) and vehicle sensor networks. The presentation will include tips for starting your own Big Data projects.

PANEL – Human vs. Machine: Balancing Human-­based Analysis with Automated Analytics and Machine Learning

Moderator: Eric Kavanagh, Host of DM Radio for Information Management
Panelists:

  • Bill Shannon, Professor of Biostatistics in Medicine, Washington University, and Founder and President, BioRankings, LLC
  • Radhika Subramanian, CEO of Emcien
  • Bruno Kurtic, Founding VP of Product Management and Strategy at Sumo Logic
  • Kilian Weinberger, Assistant Professor of Machine Learning (Department of Computer Science), Washington University

You’ve seen the projections: a severe shortage of data scientists is at hand…threatening our ability to leverage Big Data. At the same time, however, research continually pushes the state of the art in machine learning and pattern/similarity search algorithms. More and more technology companies are claiming to leverage advanced algorithms to eliminate the need for dedicated data scientists. Coming full circle, we hear arguments that you not only need data scientists but you also need to balance the skills of an entire Big Data team comprised of technologists, statisticians and domain experts. Our expert panelists with discuss their viewpoints on balancing human-based analysis with automated analytics and machine learning.

July 31: Developing Your Technology Strategy
Discover technologies to implement your data and analytics strategies.

Big Data @ Riot Games – Using Hadoop to Understand Player Experience

Jerome Boulon, Technical Director of Data Services for Riot Game’s Big Data team
Riot Games aims to be the most player-focused game company in the world. To fulfill that mission, it’s vital we develop a deep, detailed understanding of players’ experiences. This is particularly challenging since our debut title, League of Legends, is one of the most played video games in the world, with more than 32 million active monthly players across the globe. In this presentation, we’ll discuss several use cases where we sought to understand and improve the player experience, the challenges we faced to solve those use cases, and the big data infrastructure that supports our capability to provide continued insight.

Thinking in MapReduce

Ryan Brush, Distinguished Engineer with Cerner Corporation
MapReduce reflects the essence of scalable processing: split a big problem into lots of parts, process them in parallel, and then merge the results. Yet this model is at odds with how we’ve thought about computing for most of history, where we center our applications on long­lived stores of mutable data and incrementally apply change. This difference means a new mindset is needed to best leverage Hadoop and its ecosystem. This talk lays out the basics of MapReduce, designing logic and data models to make the best use of the Hadoop platform. It also goes through a number of design patterns and how Cerner is applying them to health care.

Real Time Event Processing and In-­memory analysis of Big Data

Vinod Vydier, Middleware Specialist at Oracle
There are multiple projects (for example Cloudera’s Impala) that do real­time or near real time analysis of Big Data. However, if there are events that need to be looked at and responded to in real time (for example credit card fraud or a vehicle metrics to alert a driver) this can have a significant impact on data collection and analysis using the traditional Big Data techniques. In this session, I will introduce strategies on how you can use an Event Processing Engine to respond to events in real time, and then filter and categorize data in memory before storing data in HDFS. This will make it easier to run Hadoop jobs on the data collected, and also have end clients respond to the critical events in real time.

CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant

Adam Kocoloski, Co­Founder & CTO of Cloudant, CouchDB Expert
Cloudant operates database clusters comprising 100+ nodes based on BigCouch, the company’s fork of CouchDB. Key elements of CouchDB’s design have proven instrumental to success at this scale, including version histories, append-­only storage, and multi-­master replication. In this talk, Cloudant Co­Founder and Apache CouchDB Committer Adam Kocoloski will discuss lessons learned from running production CouchDB clusters bigger than many well­publicized Hadoop deployments, and how Cloudant’s experience at scale is informing development work on the next release of Apache CouchDB.

A New Data Architecture for the App Economy

Anant Jhingran, VP of Products at Apigee
It has been clear for quite some time that traditional warehouses do not cut it for unstructured and semi­structured data, and therefore new systems such as NoSQL and Hadoop have emerged. But these systems throw the baby out with the bathwater. Traditional warehouses were built on the premise that applications can be simpler because the databases did a lot. Of course, the penalty for this was that the application’s world view had to fit the relational, database world view. In the new Big Data system, the primitives have been lowered so much (simple key value pair, or completely unstructured tuple structure), that the applications now have to do a lot more. We argue that there is a happy medium. We have studied the kinds of data that sits in the app economy, and the data structures that need to be built on top of NoSQL and Hadoop that considerably speed up Insights in the app economy without requiring every problem to be coded from scratch.

Legacy Analysis: How Hadoop Streaming Enables Software Reuse – A Genomics Case Study

Jeff Melching, Big Data Engineer and Architect at Monsanto
The bioinformatics domain and in particular computational genomics has always had the problem of computing analytics against very large data sets. Traditionally, these analytics have leveraged grid and compute farm technologies. Additionally, the analytics software and algorithms have been built up over the past 30 years by contributions from both the public and private domain and written in a number of programming languages. When these software packages are brought in house and combined with the skills and preferences of internal bioinformatics researchers, what you get is a myriad of different technologies linked together in an analytics pipeline. The rise of technologies like MapReduce in hadoop have made the execution of such pipelines much more efficient, but what about all those analytic pipelines I have built up over the years that aren’t written in MapReduce? Do I have to rewrite them? Do I have to know java? This talk will explain how hadoop streaming can help you reuse instead of rewriting. It will also touch on techniques for packaging and deploying hadoop applications without having to centrally manage software versions on the cluster.

Enterprise Workflow Management Using Oozie @ Riot Games

Matthew Goeke, Big Data engineer at Riot Games
The massive push for big data across multiple industries can leave companies new to the Hadoop ecosystem looking for ways to integrate their existing enterprise technology into this evolving space. Automated workflows, especially traditional ETL pipelines, can be a daunting task when you consider that many enterprise workflow frameworks don’t natively integrate with Hadoop. That’s where the Oozie engine comes in. In this presentation we’ll address the utility, management features and lessons we learned from integrating Oozie into the Riot Games’ Big Data pipeline. We’ll also cover our implementation of Oozie in both the physical datacenter and in the Amazon cloud.

July 29: Pre-Conference Training

Understanding the NoSQL Landscape

This is a fast paced, technical overview of the NoSQL landscape. The objectives for this training include the following:

  • Introduce students to the core concepts of Big Data
  • Provide a general overview of the most common NoSQL stores
  • Explain how to choose the correct NoSQL database for specific use cases
  • Deep Dive into the architecture of Hadoop (HDFS/MapReduce), Cassandra and HBase
  • General overview of the architecture of MongoDB and Neo4J
  • Familiarize students with the emerging architectures in the world of NoSQL: Impala, Drill, Stinger initiative

Audience: This survey course is targeted towards both technical and non-technical professionals who want to understand the emerging world of Big Data.  No prior knowledge of databases or programming is assumed. Engineers, Programmers, Networking specialists, Managers and Executives should plan on attending.

Read the full details.

August 1: Post-Conference Training

Introduction to Hadoop

The first training workshop is to be held the morning of August 1, 2013 and is an “Introduction to Hadoop” workshop that assumes no prior experience with Hadoop and would be appropriate for anyone attending the conference (e.g. managers, analysts, developers, and system administrators). This tutorial provides a solid foundation for those seeking to understand large­-scale data processing with MapReduce and Hadoop, plus its associated ecosystem. This session is intended for those who are new to Hadoop and are seeking to understand where Hadoop is appropriate and how it fits with existing systems. It will cover:

  • The rationale for Hadoop
  • Understanding the Hadoop Distributed File System (HDFS) and MapReduce
  • Common Hadoop use cases
  • Overview of the other components in a typical Hadoop “stack” such as these Apache projects: Hive,Pig, HBase, Sqoop, Flume and Oozie

Finding Insight in Big Data

The second training workshop is “Finding Insight in Big Data,” which builds on the foundation gained from attending the “Introduction to Hadoop” workshop. This would be more in-­depth and would demonstrate how to use high-­level tools like Hive and Pig (instead of low­-level MapReduce code) to find valuable patterns in the types of data that companies commonly produce. This session is appropriate for both analysts and developers since the focus on high-­level tools eliminates the need for students to have Java programming experience. This is to be held the afternoon of August 1, 2013.

 

Register to join us at StampedeCon 2013.