StampedeCon 2015 Big Data Conference
We’re bringing together the Big Data industry’s leading experts and hundreds of professionals from the region’s top companies to help deliver the content and connections you need to get the most out of your data.
July 14: Pre-conference Technical Workshop
Deep Dive into Apache Cassandra & Apache Spark
Get hands-on experience building a scalable, real-time Big Data analytics platform.
Led by one of the industry’s sought after technical gurus, this pre-conference technical session will be a hands on, deep dive into Apache Cassandra & Apache Spark. We’re going to roll up our sleeves and get our hands dirty. Read more…
July 15-16: Networking and Expert Speakers
Our focus for StampedeCon 2015, our fourth year, is on sessions that:
- Help businesses discover new ways to find value in the data available to them.
- Help participants understand how to integrate Big Data technologies and methodologies into their existing organization.
Location: Sheraton Westport Chalet in St. Louis, MO
July 15: 8am-7pm – View the July 15 detailed schedule
July 16: 8am-5pm – View the July 16 detailed schedule
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures. Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom […]
Electronic Health Records systems are required to keep an audit trail of everyone accessing any patient information. What results is similar to the click-stream of a website. This audit data is required for regulatory compliance, but can also be useful in understanding behavior patterns and how processes actually get done. Mercy has more than 20TB […]
Express Scripts currently manages pharmacy benefits for one in every three Americans, totaling approximately 100 million lives in its entire book of business. Processing claims and managing the benefit for such a large patient base has produced massive, detailed healthcare datasets that cannot be found in aggregate within any other organization or firm in the US. Express Scripts […]
Graphs are eating the world – but in what form? Starting off with a primer on Graph Databases, this talk will focus on practical examples of graph applications. We’ll look at multiple use cases like job boards, dating sites, recommendation engines of all kinds, network management, scheduling engines, etc. We’ll also see some examples of […]
There is an adage, “If you fail to plan, you plan to fail” . When developing systems the adage can be taken a step further, “If you fail to plan FOR FAILURE, you plan to fail”. At Huffington post data moves between a number of systems to provide statistics for our technical, business, and editorial teams. Due […]
The global Monsanto R&D pipeline produces millions of new plant populations every year; each which contributes to a dataset of genetic ancestry spanning several decades. Historically the constraints of modeling and processing this data within an RDBMS has made drawing inferences from this dataset complex and computationally infeasible at large scale. Fortunately, the genetic history […]
Riot Games’ mission statement is to become the most player focused company in the world. With over 67 million players battling on the fields of justice every month, League of Legends generates more than 45 terabytes of data on a daily basis. From game events to store transactions, data comes in from thousands of sources […]
As a frequent recipient of the J.D. Powers award for excellence in customer service, T-Mobile takes great pride in the quality of care that we provide our customers. As smartphone technologies advance (and fragment), the challenge of providing quality technical support can be daunting. To address this challenge, T-Mobile is reinventing many of its traditional […]
Mercy has built a system using batch and streaming technology to allow batch and near real-time updates to flow from its Epic EHR (Electronic Health Records) system into our Hadoop cluster. Mercy is using this system to provide reporting and analytics capabilities to its researchers, business owners, and physicians. The system uses Sqoop, Flume, Pig, […]
Picking your distribution and platform is just the first decision of many you need to make in order to create a successful data ecosystem. In addition to things like replication factor and node configuration, the choice of file format can have a profound impact on cluster performance. Each of the data formats have different strengths […]
Companies have very detailed data available, from the planning process through the manufacturing and packaging process. However, once products are shipped to the distributors or retail stores, companies have little data regarding product, place, position, placement, packaging, promotion and price of their products. Companies across all market segments have very little insight from their distributors […]
Today if a byte of data were a gallon of water, in only 10 seconds there would be enough data to fill an average home, in 2020 it will only take 2 seconds. The Internet of Things is driving a tremendous amount of this growth, providing more data at a higher rate then we’ve ever […]
This session will begin with an overview of current non-volatile memory (NVM, aka persistent memory) architectures and its relationship between several levels of memory and storage hierarchy, both near- and far-processor. A discussion on its significant impact on computing analytic workloads now and in the near future will ensue, including use cases and the concept […]
Famine, Poverty, Disease, Climate Change. The Apocalypse is here. Who will save us from sure destruction? Big Data, that’s who. In this presentation we will discuss how Big Data is being introduced into solving these issues before it’s too late. Also how Big Data is fundamentally changing the way we fight these issues.
This talk will examine the benefits of using multiple persistence strategies to build an end-to-end predictive engine. Utilizing Spark Streaming backed by a Cassandra persistence layer allows rapid lookups and inserts to be made in order to perform real-time model scoring. Spark backed by Parquet files, stored in HDFS, allows for high-throughput model training and […]
YARN enables Hadoop to move beyond just pure batch processing. With that multiple workloads and tenants now must be able to share a single infrastructure for data processing. Features of the Capacity Scheduler enable resource sharing among multiple tenants in a fair manner with elastic queues to maximize utilization. This talk will focus on the […]
As technology evolves, consumers are able to do more and more things in a remote setting—banking, shopping, communication, you name it. The more enabled we are, the more fraud is possible. As individuals use their identities to apply for goods and services – credit, loans, wireless phones, mortgages, etc. – certain patterns emerge. ID Analytics, […]
Visualizing large amounts of data interactively can stress the limits of computer resources and human patience. Shaping data and the way it is viewed can allow exploration of large data sets interactively. Here we will look at how to generate a large amount of data and to organize it so that it can be explored […]
The event driven architecture and the non-blocking I/O API in Node.js provides an environment for providing real time analytics in a responsive web application. This session will discuss the emergence of modern web frameworks and Node.js and the advances in the Hadoop ecosystem (Spark and Storm) and frameworks like Kafka and Samza that can bring […]
Today’s world is awash in data, and organizations are rapidly discovering that putting this data to work is the single most important factor in their ability to remain relevant to hyper-connected consumers. In this session, HP will explore the new trends of this appified, thingified, context-rich world and how HP’s Haven platform can give you […]
This is a technical workshop taking place on July 14 and is purchased separately from the conference. It will be led by Jon Haddad, one of the industry’s sought after technical gurus. He’s the maintainer of cqlengine, the Python object mapper for Cassandra and he also works with Datastax as their Technical Evangelist. Get hands-on experience […]
The starting point for this project was a MapReduce application that processed log files produced by the support portal. This application was running on Hadoop with Ruby Wukong. At the time of the project start it was underperforming and did not show good scalability. This made the case for redesigning it using Spark with Scala […]
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!