We’ve got an incredible set of presentations lined up for StampedeCon 2013! Please join us and register today!
StampedeCon 2013 will be held July 30-31, 2013 in St. Louis, MO. The conference is scheduled from 8:00 AM to 4:30 PM each day. Stay tuned for the detailed schedule and a few additional speakers!
- July 30: Developing Your Data and Analytics Strategies
- Five Trends in Analytics . . . How to Take Advantage Today
- From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data
- Big Data, Big Law
- Big Data Analytics: Inside and Out
- Cloud-Friendly Hadoop and Hive
- PANEL – Human vs. Machine: Balancing Human-based Analysis with Automated Analytics and Machine Learning
- more to be added soon.
- July 31: Developing Your Technology Strategy
- Big Data @ Riot Games – Using Hadoop to Understand Player Experience
- Thinking in MapReduce
- Real Time Event Processing and In-memory analysis of Big Data
- CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant
- A New Data Architecture for the App Economy
- Legacy Analysis: How Hadoop Streaming Enables Software Reuse – A Genomics Case Study
- Enterprise Workflow Management Using Oozie @ Riot Games
- more to be added soon.
July 30: Developing Your Data and Analytics Strategies
Connect business goals and realities with big data architecture decisions.
Five Trends in Analytics . . . How to Take Advantage Today
John Lucker, Partner and Principal at Deloitte Consulting
Lucker will discuss the latest advancements in the world of analytics and offer strategies for tapping into their potential. The topic areas include visualization and design, mobile analytics and strategy analytics.
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data
Paul Doherty, President and CEO of the digit group, inc.
As our world emerges as large, urban environments, built environment and IT professionals are challenged with integrating a building’s Digital DNA into the urban fabric of Smart City initiatives. This creates opportunities for Cloudbased and mobile analysis and management that can lead to better design, performance, service and sustainability. The knowledge behind the urban intelligence of Big Data latently resides with today’s built environment and IT professional. Join us for a discussion that will define Smart Cities, identify Smart Buildings and provide you with best practices, lessons learned and a framework strategy for your organization to profit from the Smart Cities movement.
Big Data, Big Law
Anthony Martin, Chief Privacy and Information Security Counsel – Walmart
This is the story of one global, multichannel company’s walk through the increasingly complicated Legal, Compliance, Security maze while trying to recognize the implicit value of Big Data programs.
Big Data Analytics: Inside and Out
Michael Cavaretta, Ford Motor Company – Lead of Predictive Analytics group in Research and Advanced Engineering
Cavaretta will present three areas of opportunity for Big Data Analytics: improving internal processing, understanding customers though external data (including social) and vehicle sensor networks. The presentation will include tips for starting your own Big Data projects.
Cloud-Friendly Hadoop and Hive
Shrikanth Shankar, Head of Engineering – Qubole
The cloud reduces the barrier to entry for many small and medium size enterprises into analytics. Hadoop and related frameworks like Hive, Oozie, Sqoop are becoming tools of choice for deriving insights from data. However, these frameworks were designed for in-house datacenters, which have different tradeoffs from a cloud environment, and making them run well in the cloud presents some challenges. In this talk, Shrikanth Shankar, Head of Engineering at Qubole, describes how these experiences taught us to extend Hadoop and Hive to exploit these new tradeoffs. Use cases will be presented that show how the challenges at large scale at Facebook are now making it extremely easy for a significantly smaller end user to leverage these technologies in the cloud.
PANEL – Human vs. Machine: Balancing Human-based Analysis with Automated Analytics and Machine Learning
Moderator: Eric Kavanagh, Host of DM Radio for Information Management
- Bill Shannon, Professor of Biostatistics in Medicine, Washington University, and Founder and President, BioRankings, LLC
- Radhika Subramanian, CEO of Emcien
- Bruno Kurtic, Founding VP of Product Management and Strategy at Sumo Logic
- Kilian Weinberger, Assistant Professor of Machine Learning (Department of Computer Science), Washington University
You’ve seen the projections: a severe shortage of data scientists is at hand…threatening our ability to leverage Big Data. At the same time, however, research continually pushes the state of the art in machine learning and pattern/similarity search algorithms. More and more technology companies are claiming to leverage advanced algorithms to eliminate the need for dedicated data scientists. Coming full circle, we hear arguments that you not only need data scientists but you also need to balance the skills of an entire Big Data team comprised of technologists, statisticians and domain experts. Our expert panelists with discuss their viewpoints on balancing human-based analysis with automated analytics and machine learning.
July 31: Developing Your Technology Strategy
Discover technologies to implement your data and analytics strategies.
Big Data @ Riot Games – Using Hadoop to Understand Player Experience
Jerome Boulon, Technical Director of Data Services for Riot Game’s Big Data team
Riot Games aims to be the most player-focused game company in the world. To fulfill that mission, it’s vital we develop a deep, detailed understanding of players’ experiences. This is particularly challenging since our debut title, League of Legends, is one of the most played video games in the world, with more than 32 million active monthly players across the globe. In this presentation, we’ll discuss several use cases where we sought to understand and improve the player experience, the challenges we faced to solve those use cases, and the big data infrastructure that supports our capability to provide continued insight.
Thinking in MapReduce
Ryan Brush, Distinguished Engineer with Cerner Corporation
MapReduce reflects the essence of scalable processing: split a big problem into lots of parts, process them in parallel, and then merge the results. Yet this model is at odds with how we’ve thought about computing for most of history, where we center our applications on longlived stores of mutable data and incrementally apply change. This difference means a new mindset is needed to best leverage Hadoop and its ecosystem. This talk lays out the basics of MapReduce, designing logic and data models to make the best use of the Hadoop platform. It also goes through a number of design patterns and how Cerner is applying them to health care.
Real Time Event Processing and In-memory analysis of Big Data
Vinod Vydier, Middleware Specialist at Oracle
There are multiple projects (for example Cloudera’s Impala) that do realtime or near real time analysis of Big Data. However, if there are events that need to be looked at and responded to in real time (for example credit card fraud or a vehicle metrics to alert a driver) this can have a significant impact on data collection and analysis using the traditional Big Data techniques. In this session, I will introduce strategies on how you can use an Event Processing Engine to respond to events in real time, and then filter and categorize data in memory before storing data in HDFS. This will make it easier to run Hadoop jobs on the data collected, and also have end clients respond to the critical events in real time.
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant
Adam Kocoloski, CoFounder & CTO of Cloudant, CouchDB Expert
Cloudant operates database clusters comprising 100+ nodes based on BigCouch, the company’s fork of CouchDB. Key elements of CouchDB’s design have proven instrumental to success at this scale, including version histories, append-only storage, and multi-master replication. In this talk, Cloudant CoFounder and Apache CouchDB Committer Adam Kocoloski will discuss lessons learned from running production CouchDB clusters bigger than many wellpublicized Hadoop deployments, and how Cloudant’s experience at scale is informing development work on the next release of Apache CouchDB.
A New Data Architecture for the App Economy
Anant Jhingran, VP of Products at Apigee
It has been clear for quite some time that traditional warehouses do not cut it for unstructured and semistructured data, and therefore new systems such as NoSQL and Hadoop have emerged. But these systems throw the baby out with the bathwater. Traditional warehouses were built on the premise that applications can be simpler because the databases did a lot. Of course, the penalty for this was that the application’s world view had to fit the relational, database world view. In the new Big Data system, the primitives have been lowered so much (simple key value pair, or completely unstructured tuple structure), that the applications now have to do a lot more. We argue that there is a happy medium. We have studied the kinds of data that sits in the app economy, and the data structures that need to be built on top of NoSQL and Hadoop that considerably speed up Insights in the app economy without requiring every problem to be coded from scratch.
Legacy Analysis: How Hadoop Streaming Enables Software Reuse – A Genomics Case Study
Jeff Melching, Big Data Engineer and Architect at Monsanto
The bioinformatics domain and in particular computational genomics has always had the problem of computing analytics against very large data sets. Traditionally, these analytics have leveraged grid and compute farm technologies. Additionally, the analytics software and algorithms have been built up over the past 30 years by contributions from both the public and private domain and written in a number of programming languages. When these software packages are brought in house and combined with the skills and preferences of internal bioinformatics researchers, what you get is a myriad of different technologies linked together in an analytics pipeline. The rise of technologies like MapReduce in hadoop have made the execution of such pipelines much more efficient, but what about all those analytic pipelines I have built up over the years that aren’t written in MapReduce? Do I have to rewrite them? Do I have to know java? This talk will explain how hadoop streaming can help you reuse instead of rewriting. It will also touch on techniques for packaging and deploying hadoop applications without having to centrally manage software versions on the cluster.
Enterprise Workflow Management Using Oozie @ Riot Games
Matthew Goeke, Big Data engineer at Riot Games
The massive push for big data across multiple industries can leave companies new to the Hadoop ecosystem looking for ways to integrate their existing enterprise technology into this evolving space. Automated workflows, especially traditional ETL pipelines, can be a daunting task when you consider that many enterprise workflow frameworks don’t natively integrate with Hadoop. That’s where the Oozie engine comes in. In this presentation we’ll address the utility, management features and lessons we learned from integrating Oozie into the Riot Games’ Big Data pipeline. We’ll also cover our implementation of Oozie in both the physical datacenter and in the Amazon cloud.