StampedeCon Big Data ConferenceJuly 25th, 2017 - St. Louis, MO
Experts and thought leaders discuss Big Data architecture, tools and industry use cases at the 6th annual StampedeCon Big Data Conference.Register Now
Platform Engineer at GIPHY
Dead Simple AB testing at Massive Scale with Scala and Spark
As a small engineering team, GIPHY needs creative, lightweight solutions to deliver features for a massive global user base. Leveraging the type-safety and expressiveness of Scala along with the performance of Spark allowed us to rapidly deliver a reliable AB testing solution for our search engine. Demand for richer content in messaging and internet communication continues to grow rapidly. As a provider of such content, GIPHY has seen exceptional growth in traffic while maintaining a relatively small engineering team. Engineering problems must be approached with the following constraints in mind:
- solutions must be designed to scale
- features need to be delivered as quickly as possible
- unnecessary architectural complications and dependencies on other teams need to minimized
- code should be as generic and extensible as possible
Search is core to GIPHY and we needed to better incorporate data into our feature decision-making process. AB + Multivariate testing was an obvious capability. This talk will cover how we designed such a solution within the above-mentioned constraints and why Scala was a natural choice. Highlights include:
- parsing and modeling your data for use with functional paradigms in mind
- abstracting user experiments and statistical attribution in a generic and immutable manner
- maximizing Spark code reliability with compile-time safety and unit-testing
- adopting Bayesian inference over more classical frequentist approaches to empower more intuitive interpretations of experiments
- how a monorepo approach enables better code reuse and unification of services code with offline analytics
Solutions Engineer Manager at DataStax
Graph in Customer 360
Enterprises typically have many data silos of partial customer data and a common theme in big data projects to use big data tools and pipelines to unify all siloed customer data into a single, queryable, platform for improving all future customer interactions. This data often comes from billing, website traffic, logistics, and marketing; all in different formats with different properties. Graph provides a way to unify all of the data into a single place for use in tracking the flow of a user through the various silos. Graph can also be used for visualizations and analytics that are difficult in other systems.
In this talk we will explore the ways in which Graph can be leveraged in a customer 360 use case. What it can add to a more conventional system and what the approach to developing a graph based Customer 360 system should be.
HDF/IoT Product Solutions Architect at Hortonworks
Apache Beam: The Case for Unifying Streaming API’s
Our needs for real-time data are growing at an unprecedented rate; it is only a matter of time before you will be faced with building a real-time streaming pipeline. Often a major key decision you would need to quickly make is which stream-processing framework should you use. What if instead you could use a unified API that allows you to express complex data processing workflows, including advanced windowing and event timing and aggregate computations? Apache Beam aims to provide this unified model along with a set of language-specific SDKs for defining and executing complex data processing, data ingestion and integration workflows. This simplifies and will truly change how we implement and think about large-scale batch and streaming data processing in the future. Today these pipelines can be run on Apache Flink, Apache Spark, and Google Cloud Dataflow. This is only a start, come to this session to learn where the future of streaming API’s is headed and get ready to leverage Apache Beam for your next streaming project.
Independent Big Data Consultant
Big Data Antidotes
49% of large companies are implementing Big Data solutions today. But 65% of Big Data projects failed. How can you not get stuck with various syndromes? This talk presents a comprehensive set of Big Data antipatterns, defined as common responses to recurring problems that are usually ineffective and risks being highly counterproductive. We introduce an overarching framework represented in a cube with 3 dimensions: Category, Area, and Type (CAT). Each cube edge is broken down to 3 parts. The Category edge comprises Business, Application, and Technology (BAT). The Area edge is composed of Plan, Implement, and Govern (PIG). Likewise, the Type edge consists of Resource, Architecture, and Management (RAM). Further, each of the 27 cells in the 3X3X3 hexahedron contains classified Big Data antipatterns, which are from lessons learned in the real-life projects and initiatives. We will zoom in to the definition and characterization of the CAT dimensions, and then dive deep to selected antipatterns like Golden Hammer, Dependency Dilemma, Data Swamps, Unimodality, and Product Pollution in detail. Real-world user stories and case studies will be discussed, along with best practices to avoid the pitfalls and traps.
Building Streaming Applications with Apache Kafka
Learn how the Apache Kafka’s Streams API allows you to develop next-generation applications and microservices services built upon the proven reliability, scalability, and low latency of Apache Kafka. In this session, you will learn about the architecture of the Streams API along with an overview use cases where it can be best applied.
Founding Partner at Miner & Kasch
End-to-end Big Data Projects with Python
This talk will go over how to build an end-to-end data processing system in Python, from data ingest, to data analytics, to machine learning, to user presentation. Developments in old and new tools have made this particularly possible today. The talk in particular will talk about Airflow for process workflows, PySpark for data processing, Python data science libraries for machine learning and advanced analytics, and building agile microservices in Python.
System architects, software engineers, data scientists, and business leaders can all benefit from attending the talk. They should learn how to build more agile data processing systems and take away some ideas on how their data systems could be simpler and more powerful.
Manager, Information Architect at Daugherty Business Solutions
So You Don’t Have an Admin Team – Doing Big Data using Amazon’s analogs
Big Data doesn’t have to just mean Hadoop any more. Big Data can be done in the cloud, using tools developed by the Cloud providers. This session will cover using Amazon AWS services to implement a Big Data application. We will compare and contrast different services from Amazon with the Hadoop equivalents.
Senior Vice President and Chief Technology Officer of Symbolic IO
The New World of Analytics using Persistent Memory
For decades, we have used memory in its volatile form – DRAM. However, we now have persistent memory – memory which retains its data across power loss or shutdown – and the impact on the control and execution of analytics workflows is significant. This talk will explore persisting data in the memory channel, especially in server architecture, and explore the optimization possibilities of using persistent memory in Spark, graph solvers, and other useful analytics tools. The reality of 10-100X (not percent – X) reduction in execution runtimes and the economics of persistent memory will be discussed.
Location: Eric P Newman Education Center, Washington University Medical School
320 S Euclid
St. Louis, MO 63110
Metro Parking Garage
This is EPNEC’s primary parking garage.
Located at the corner of Taylor and Children’s Place Avenues.
Daily Rate is $15 – Accepts Cash Only
Click Here for Printable Map and Directions
EPNEC is an IACC-certified conference center on the campus of Washington University Medical Center in St. Louis, Missouri
911 Washington Ave, St. Louis, MO 63101, USA