StampedeCon 2015 Big Data Conference

We’re bringing together the Big Data industry’s leading experts and hundreds of professionals from the region’s top companies to help deliver the content and connections you need to get the most out of your data.

July 14: Pre-conference Technical Workshop

Deep Dive into Apache Cassandra & Apache Spark

Get hands-on experience building a scalable, real-time Big Data analytics platform.

Led by one of the industry’s sought after technical gurus, this pre-conference technical session will be a hands on, deep dive into Apache Cassandra & Apache Spark. We’re going to roll up our sleeves and get our hands dirty.  Read more…

July 15-16: Networking and Expert Speakers

Our focus for StampedeCon 2015, our fourth year, is on sessions that:

  • Help businesses discover new ways to find value in the data available to them.
  • Help participants understand how to integrate Big Data technologies and methodologies into their existing organization.

Location: Sheraton Westport Chalet in St. Louis, MO

July 15: 8am-7pm – View the July 15 detailed schedule

July 16: 8am-5pm – View the July 16 detailed schedule

Cassandra 3.0: JSON at scale

This session will explore the new features in Cassandra 3.0, starting with JSON support.  Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures. Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom […]

Click for more information on 'Cassandra 3.0: JSON at scale'

Handling Access Logs with Hadoop

Electronic Health Records systems are required to keep an audit trail of everyone accessing any patient information.  What results is similar to the click-stream of a website.  This audit data is required for regulatory compliance, but can also be useful in understanding behavior patterns and how processes actually get done.  Mercy has more than 20TB […]

Click for more information on 'Handling Access Logs with Hadoop'

Advanced Analytics in Healthcare

Express Scripts currently manages pharmacy benefits for one in every three Americans, totaling approximately 100 million lives in its entire book of business. Processing claims and managing the benefit for such a large patient base has produced massive, detailed healthcare datasets that cannot be found in aggregate within any other organization or firm in the US. Express Scripts […]

Click for more information on 'Advanced Analytics in Healthcare'

Graph Database Use Cases

Graphs are eating the world – but in what form? Starting off with a primer on Graph Databases, this talk will focus on practical examples of graph applications. We’ll look at multiple use cases like job boards, dating sites, recommendation engines of all kinds, network management, scheduling engines, etc. We’ll also see some examples of […]

Click for more information on 'Graph Database Use Cases'

Resilience: The key requirement of a [big][data] architecture

There is an adage, “If you fail to plan, you plan to fail” . When developing systems the adage can be taken a step further, “If you fail to plan FOR FAILURE, you plan to fail”.  At Huffington post data moves between a number of systems to provide statistics for our technical, business, and editorial teams. Due […]

Click for more information on 'Resilience: The key requirement of a [big][data] architecture'

Managing Genetic Ancestry at Scale with Neo4j and Kafka

The global Monsanto R&D pipeline produces millions of new plant populations every year; each which contributes to a dataset of genetic ancestry spanning several decades. Historically the constraints of modeling and processing this data within an RDBMS has made drawing inferences from this dataset complex and computationally infeasible at large scale. Fortunately, the genetic history […]

Click for more information on 'Managing Genetic Ancestry at Scale with Neo4j and Kafka'

Building A Player Focused Data Pipeline

Riot Games’ mission statement is to become the most player focused company in the world. With over 67 million players battling on the fields of justice every month, League of Legends generates more than 45 terabytes of data on a daily basis. From game events to store transactions, data comes in from thousands of sources […]

Click for more information on 'Building A Player Focused Data Pipeline'

From SQL to NoSQL

As a frequent recipient of the J.D. Powers award for excellence in customer service, T-Mobile takes great pride in the quality of care that we provide our customers. As smartphone technologies advance (and fragment), the challenge of providing quality technical support can be daunting. To address this challenge, T-Mobile is reinventing many of its traditional […]

Click for more information on 'From SQL to NoSQL'

Batch and Real-time EHR updates into Hadoop

Mercy has built a system using batch and streaming technology to allow batch and near real-time updates to flow from its Epic EHR (Electronic Health Records) system into our Hadoop cluster. Mercy is using this system to provide reporting and analytics capabilities to its researchers, business owners, and physicians. The system uses Sqoop, Flume, Pig, […]

Click for more information on 'Batch and Real-time EHR updates into Hadoop'

Choosing an HDFS data storage format: Avro vs. Parquet and more

Picking your distribution and platform is just the first decision of many you need to make in order to create a successful data ecosystem. In addition to things like replication factor and node configuration, the choice of file format can have a profound impact on cluster performance. Each of the data formats have different strengths […]

Click for more information on 'Choosing an HDFS data storage format: Avro vs. Parquet and more'

Businesses need Big Data for 360 Marketing

Companies have very detailed data available, from the planning process through the manufacturing and packaging process. However, once products are shipped to the distributors or retail stores, companies  have little data regarding product, place, position, placement, packaging, promotion and price of their products. Companies across all market segments have very little insight from their distributors […]

Click for more information on 'Businesses need Big Data for 360 Marketing'

Lifting the hood on Spark Streaming

Today if a byte of data were a gallon of water, in only 10 seconds there would be enough data to fill an average home, in 2020 it will only take 2 seconds. The Internet of Things is driving a tremendous amount of this growth, providing more data at a higher rate then we’ve ever […]

Click for more information on 'Lifting the hood on Spark Streaming'

Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Care

This session will begin with an overview of current non-volatile memory (NVM, aka persistent memory) architectures and its relationship between several levels of memory and storage hierarchy, both near- and far-processor. A discussion on its significant impact on computing analytic workloads now and in the near future will ensue, including use cases and the concept […]

Click for more information on 'Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Care'

How Big Data Will Save Planet Earth

Famine, Poverty, Disease, Climate Change. The Apocalypse is here. Who will save us from sure destruction? Big Data, that’s who. In this presentation we will discuss how Big Data is being introduced into solving these issues before it’s too late. Also how Big Data is fundamentally changing the way we fight these issues.

Click for more information on 'How Big Data Will Save Planet Earth'

Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Engine

This talk will examine the benefits of using multiple persistence strategies to build an end-to-end predictive engine. Utilizing Spark Streaming backed by a Cassandra persistence layer allows rapid lookups and inserts to be made in order to perform real-time model scoring. Spark backed by Parquet files, stored in HDFS, allows for high-throughput model training and […]

Click for more information on 'Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Engine'

Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption

YARN enables Hadoop to move beyond just pure batch processing. With that multiple workloads and tenants now must be able to share a single infrastructure for data processing. Features of the Capacity Scheduler enable resource sharing among multiple tenants in a fair manner with elastic queues to maximize utilization. This talk will focus on the […]

Click for more information on 'Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption'

Identity Fraud Protection Using Big Data Analytics

As technology evolves, consumers are able to do more and more things in a remote setting—banking, shopping, communication, you name it. The more enabled we are, the more fraud is possible.  As individuals use their identities to apply for goods and services – credit, loans, wireless phones, mortgages, etc. – certain patterns emerge.  ID Analytics, […]

Click for more information on 'Identity Fraud Protection Using Big Data Analytics'

Interactive Visualization in Human Time

Visualizing large amounts of data interactively can stress the limits of computer resources and human patience.  Shaping data and the way it is viewed can allow exploration of large data sets interactively.  Here we will look at how to generate a large amount of data and to organize it so that it can be explored […]

Click for more information on 'Interactive Visualization in Human Time'

Stream Processing and real-time eventing using Node.js and React.js in the Hadoop ecosystem

The event driven architecture and the non-blocking I/O API in Node.js provides an environment for providing real time analytics in a responsive web application. This session will discuss the emergence of modern web frameworks and Node.js and the advances in the Hadoop ecosystem (Spark and Storm) and frameworks like Kafka and Samza that can bring […]

Click for more information on 'Stream Processing and real-time eventing using Node.js and React.js in the Hadoop ecosystem'

Action from Insight: Joining the 2 Percent Who are Getting Big Data Right

Today’s world is awash in data, and organizations are rapidly discovering that putting this data to work is the single most important factor in their ability to remain relevant to hyper-connected consumers. In this session, HP will explore the new trends of this appified, thingified, context-rich world and how HP’s Haven platform can give you […]

Click for more information on 'Action from Insight: Joining the 2 Percent Who are Getting Big Data Right'

Deep Dive into Apache Cassandra & Apache Spark

This is a technical workshop taking place on July 14 and is purchased separately from the conference.  It will be led by Jon Haddad, one of the industry’s sought after technical gurus. He’s the maintainer of cqlengine, the Python object mapper for Cassandra and he also works with Datastax as their Technical Evangelist. Get hands-on experience […]

Click for more information on 'Deep Dive into Apache Cassandra & Apache Spark'

How Cisco Migrated from MapReduce Jobs to Spark Jobs

The starting point for this project was a MapReduce application that processed log files produced by the support portal. This application was running on Hadoop with Ruby Wukong. At the time of the project start it was underperforming and did not show good scalability. This made the case for redesigning it using Spark with Scala […]

Click for more information on 'How Cisco Migrated from MapReduce Jobs to Spark Jobs'

Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax

Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!

Click for more information on 'Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax'

Big Data Technologies

WWT’s Advanced Technology Center (ATC) is a collaborative ecosystem to design, build, educate, demonstrate and deploy innovative technology products and integrated architectural solutions for World Wide Technology customers, partners and employees. With it, we’ve helped our customers do amazing things. The ATC allows you to explore Hadoop distributions, analytical tools and Big Data hardware. In […]

Click for more information on 'Big Data Technologies'

Deriving Value from Big Data

Organizations are looking to get more value from their data, and Big Data analytics can reveal business insights by mining data from existing sources. But how can your organization use Big Data tools? Which vendor is right for you? When it comes to Big Data, where do you start? Many of our customers have used […]

Click for more information on 'Deriving Value from Big Data'