StampedeCon 2013 Archive

///StampedeCon_2013 was an affordable national Big Data conference held in centrally-located St. Louis, MO on July 30-31, 2013 .

AGENDA

Expert speakers from Walmart, Sears, Riot Games, Cerner, Cloudant, Sumo Logic, Oracle, IBM, MapR and more shared their Big Data experiences at StampedeCon 2013 on July 30-31, 2013 in St. Louis, MO.  It was exciting conference with great speakers and great networking and we thank all who participated!  You can view our StampedeCon 2013 archive on Eventifier.

We had two days of expert speakers, networking and vendor exhibits:

  • July 30: Developing Your Data and Analytics Strategies
  • July 31: Developing Your Technology Strategy

As well as professional training before and after the conference:

  • July 29: Understanding the NoSQL Landscape (by Inferology)
  • August 1: Introduction to Hadoop + Finding Insight in Big Data (by Cloudera)

July 30: Developing Your Data and Analytics Strategies
Connect business goals and realities with big data architecture decisions.

8:00a Continental Breakfast, Exhibit Hall, Registration, Check-in
8:30a Five Trends in Analytics . . . How to Take Advantage Today
by John Lucker, Partner and Principal at Deloitte Consulting
9:10a From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data
by Paul Doherty, President and CEO of the digit group, inc.
9:50a Optimizing Data for Performance, Price and Capacity for all Storage Tiers – Active or at Rest
by Janis Landry-Lane, Program Director, IBM World-wide Technical Computing
10:00a Break (Networking, Exhibit Hall, Snacks)
10:50a PANEL – Human vs. Machine: Balancing Human-­based Analysis with Automated Analytics and Machine Learning

  • Eric Kavanagh, Host of DM Radio for Information Management
  • Bill Shannon, Professor of Biostatistics in Medicine, Washington University, and Founder and President, BioRankings, LLC
  • Bruno Kurtic, Founding VP of Product Management and Strategy at Sumo Logic
  • Kilian Weinberger, Assistant Professor of Machine Learning (Department of Computer Science), Washington University
12:10p Lunch, Exhibit Hall, Networking
12:50p Transforming Data Architecture Complexity at Sears
by Justin Sheppard, IT Director with Sears Holdings and Head of Business Operations for MetaScale
1:30p Cloud-Friendly Hadoop and Hive
by Shrikanth Shankar, Head of Engineering at Qubole
2:10p Using Hadoop to Offload Data Warehouse Processing to Save Capacity and Cut Costs (sponsored by MapR)
by Matt Ammentorp, Regional Sales Director – MapR Technologies
2:20p Break (Networking, Exhibit Hall, Snacks)
3:10p Big Data, Big Law
by Anthony Martin, Chief Privacy and Information Security Counsel – Walmart
3:50p Big Data Startup Lightning Round
Entrepreneurs from new Big Data Startups will have five minutes each to explain their new Big Data technologies and use cases.
4:20p Brief Closing Remarks
4:30p Reception for Attendees (Food, networking and exhibitors)

July 31: Developing Your Technology Strategy
Discover technologies to implement your data and analytics strategies.

8:00a Continental Breakfast, Exhibit Hall, Registration, Check-in
8:30a Big Data @ Riot Games – Using Hadoop to Understand Player Experience
by Barry Livingston, Director of Engineering at Riot Games (substituting for Jerome Boulon)
9:10a Thinking in MapReduce
by Ryan Brush, Distinguished Engineer with Cerner Corporation
9:50a Big Data Analytics (sponsored by Oracle)
by Vivek Yadav
10:00a Break (Networking, Exhibit Hall, Snacks)
10:50a CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant
by Adam Kocoloski, Co­Founder & CTO of Cloudant, CouchDB Expert
11:30a A New Data Architecture for the App Economy
by Anant Jhingran, VP of Products at Apigee
12:10p Lunch, Exhibit Hall, Networking
12:50p Analytics Using Apache Hive With The Power of Windowing & Table Functions: Use Cases
by Murtaza Doctor, Principal Architect at RichRelevance
1:30p Legacy Analysis: How Hadoop Streaming Enables Software Reuse – A Genomics Case Study
by Jeff Melching, Big Data Engineer and Architect at Monsanto
2:10p Break (Networking, Exhibit Hall, Snacks)
3:00p Enterprise Workflow Management Using Oozie @ Riot Games
by Matthew Goeke, Big Data engineer at Riot Games
3:40p Real Time Event Processing and In-­memory analysis of Big Data
by Vinod Vydier, Middleware Specialist at Oracle
4:20p Conference Closing Remarks

July 29: Pre-Conference Training

Understanding the NoSQL Landscape

This is a fast paced, technical overview of the NoSQL landscape. The objectives for this training include the following:

  • Introduce students to the core concepts of Big Data
  • Provide a general overview of the most common NoSQL stores
  • Explain how to choose the correct NoSQL database for specific use cases
  • Deep Dive into the architecture of Hadoop (HDFS/MapReduce), Cassandra and HBase
  • General overview of the architecture of MongoDB and Neo4J
  • Familiarize students with the emerging architectures in the world of NoSQL: Impala, Drill, Stinger initiative

Audience: This survey course is targeted towards both technical and non-technical professionals who want to understand the emerging world of Big Data.  No prior knowledge of databases or programming is assumed. Engineers, Programmers, Networking specialists, Managers and Executives should plan on attending.

Date/Time: July 29 from 8:30 AM to 5:00 PM (Continental Breakfast/Check-in at 8:00 AM)

August 1: Post-Conference Training

Introduction to Hadoop

The first training workshop is to be held the morning of August 1, 2013 and is an “Introduction to Hadoop” workshop that assumes no prior experience with Hadoop and would be appropriate for anyone attending the conference (e.g. managers, analysts, developers, and system administrators). This tutorial provides a solid foundation for those seeking to understand large­-scale data processing with MapReduce and Hadoop, plus its associated ecosystem. This session is intended for those who are new to Hadoop and are seeking to understand where Hadoop is appropriate and how it fits with existing systems. It will cover:

  • The rationale for Hadoop
  • Understanding the Hadoop Distributed File System (HDFS) and MapReduce
  • Common Hadoop use cases
  • Overview of the other components in a typical Hadoop “stack” such as these Apache projects: Hive,Pig, HBase, Sqoop, Flume and Oozie

Date/Time: August 1, 2013 from 8:30 AM – 12:00 PM (Continental Breakfast/Check-in at 8:00 AM)
Read the full details.

Finding Insight in Big Data

The second training workshop is “Finding Insight in Big Data,” which builds on the foundation gained from attending the “Introduction to Hadoop” workshop. This would be more in-­depth and would demonstrate how to use high-­level tools like Hive and Pig (instead of low­-level MapReduce code) to find valuable patterns in the types of data that companies commonly produce. This session is appropriate for both analysts and developers since the focus on high-­level tools eliminates the need for students to have Java programming experience. This is to be held the afternoon of August 1, 2013.

Date/Time: August 1, 2013 from 1:00 PM – 4:30 PM (box lunches provided noon to 1:00 PM)

 

PRESENTATIONS

We heard an incredible set of presentations during StampedeCon 2013! Expert speakers from Walmart, Sears, Riot Games, Cerner, Cloudant, Sumo Logic, Oracle, IBM, MapR and more shared their Big Data experiences at StampedeCon 2013 on July 30-31, 2013 in St. Louis, MO.  It was exciting conference with great speakers and great networking and we thank all who participated!  You can view our StampedeCon 2013 archive on Eventifier.

July 30: Developing Your Data and Analytics Strategies
Connect business goals and realities with big data architecture decisions.

Five Trends in Analytics . . . How to Take Advantage Today

John Lucker, Partner and Principal at Deloitte Consulting
Lucker will discuss the latest advancements in the world of analytics and offer strategies for tapping into their potential. The topic areas include visualization and design, mobile analytics and strategy analytics.

From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data

Paul Doherty, President and CEO of the digit group, inc.
As our world emerges as large, urban environments, built environment and IT professionals are challenged with integrating a building’s Digital DNA into the urban fabric of Smart City initiatives. This creates opportunities for Cloud­based and mobile analysis and management that can lead to better design, performance, service and sustainability. The knowledge behind the urban intelligence of Big Data latently resides with today’s built environment and IT professional. Join us for a discussion that will define Smart Cities, identify Smart Buildings and provide you with best practices, lessons learned and a framework strategy for your organization to profit from the Smart Cities movement.

Big Data, Big Law

Anthony Martin, Chief Privacy and Information Security Counsel – Walmart
This is the story of one global, multi­channel company’s walk through the increasingly complicated Legal, Compliance, Security maze while trying to recognize the implicit value of Big Data programs.

PANEL – Human vs. Machine: Balancing Human-­based Analysis with Automated Analytics and Machine Learning

Moderator: Eric Kavanagh, Host of DM Radio for Information Management
Panelists:

  • Bill Shannon, Professor of Biostatistics in Medicine, Washington University, and Founder and President, BioRankings, LLC
  • Bruno Kurtic, Founding VP of Product Management and Strategy at Sumo Logic
  • Kilian Weinberger, Assistant Professor of Machine Learning (Department of Computer Science), Washington University

You’ve seen the projections: a severe shortage of data scientists is at hand…threatening our ability to leverage Big Data. At the same time, however, research continually pushes the state of the art in machine learning and pattern/similarity search algorithms. More and more technology companies are claiming to leverage advanced algorithms to eliminate the need for dedicated data scientists. Coming full circle, we hear arguments that you not only need data scientists but you also need to balance the skills of an entire Big Data team comprised of technologists, statisticians and domain experts. Our expert panelists with discuss their viewpoints on balancing human-based analysis with automated analytics and machine learning.

Using Hadoop to Offload Data Warehouse Processing to Save Capacity and Cut Costs

Matt Ammentorp, Regional Sales Director – MapR Technologies
This sponsored session examines specific use cases to illustrate the design considerations and the economics behind data warehouse offloading with Hadoop. The 50X cost advantage of Hadoop and the capability to store and analyze unstructured data makes Hadoop a compelling platform for vast information store and direct ETL processing. Additional information about how to use the Hadoop platform to support extended analytics will also be covered. You will also learn:

  • The simple steps to follow to cut infrastructure costs with Hadoop
  • How to integrate with an existing data warehouse to leverage existing tools and applications
  • The key criteria to ensure you maximize returns

Transforming Data Architecture Complexity at Sears

Justin Sheppard, IT Director with Sears Holdings and Head of Business Operations for MetaScale
High ETL complexity and costs, data latency and redundancy, and batch window limits are just some of the IT challenges caused by traditional data warehouses. Gain an understanding of big data tools through the use cases and technology that enables Sears to solve the problems of the traditional enterprise data warehouse approach. Learn how Sears uses Hadoop as a data hub to minimize data architecture complexity – resulting in a reduction of time to insight by 30-70% – and discover “quick wins” such as mainframe MIPS reduction.

Cloud-Friendly Hadoop and Hive

Shrikanth Shankar, Head of Engineering – Qubole
The cloud reduces the barrier to entry for many small and medium size enterprises into analytics. Hadoop and related frameworks like Hive, Oozie, Sqoop are becoming tools of choice for deriving insights from data. However, these frameworks were designed for in-house datacenters, which have different tradeoffs from a cloud environment, and making them run well in the cloud presents some challenges. In this talk, Shrikanth Shankar, Head of Engineering at Qubole, describes how these experiences taught us to extend Hadoop and Hive to exploit these new tradeoffs. Use cases will be presented that show how the challenges at large scale at Facebook are now making it extremely easy for a significantly smaller end user to leverage these technologies in the cloud.

Optimizing Data for Performance, Price and Capacity for all Storage Tiers – Active or at Rest

Janis Landry-Lane, IBM World-wide Technical Computing
Worldwide, we now generate the equivalent of all the data that existed in the world up to 2003 every two days. In order to leverage the vast quantities of data being produced daily, and maximize the ability to effectively use the data, it must be optimized for performance, price, and capacity for all storage tiers – active or at rest.

To set the stage to examine this topic, Janis Landry-Lane will speak about IBM’s best practices for data management and archive for NGS and Life Sciences. While the example is NGS, the lessons learned are transferable to all research computing disciplines. Data retention and reuse will be addressed. There are a variety of approaches to topic, and IBM has an approach that takes the ownership to the users of the data without compromising its integrity, and ability to be easily accessed.

July 31: Developing Your Technology Strategy
Discover technologies to implement your data and analytics strategies.

Big Data @ Riot Games – Using Hadoop to Understand Player Experience

Barry Livingston, Director of Engineering at Riot Games (substituting for Jerome Boulon)
Riot Games aims to be the most player-focused game company in the world. To fulfill that mission, it’s vital we develop a deep, detailed understanding of players’ experiences. This is particularly challenging since our debut title, League of Legends, is one of the most played video games in the world, with more than 32 million active monthly players across the globe. In this presentation, we’ll discuss several use cases where we sought to understand and improve the player experience, the challenges we faced to solve those use cases, and the big data infrastructure that supports our capability to provide continued insight.

Thinking in MapReduce

Ryan Brush, Distinguished Engineer with Cerner Corporation
MapReduce reflects the essence of scalable processing: split a big problem into lots of parts, process them in parallel, and then merge the results. Yet this model is at odds with how we’ve thought about computing for most of history, where we center our applications on long­lived stores of mutable data and incrementally apply change. This difference means a new mindset is needed to best leverage Hadoop and its ecosystem. This talk lays out the basics of MapReduce, designing logic and data models to make the best use of the Hadoop platform. It also goes through a number of design patterns and how Cerner is applying them to health care.

Real Time Event Processing and In-­memory analysis of Big Data

Vinod Vydier, Middleware Specialist at Oracle
There are multiple projects (for example Cloudera’s Impala) that do real­time or near real time analysis of Big Data. However, if there are events that need to be looked at and responded to in real time (for example credit card fraud or a vehicle metrics to alert a driver) this can have a significant impact on data collection and analysis using the traditional Big Data techniques. In this session, I will introduce strategies on how you can use an Event Processing Engine to respond to events in real time, and then filter and categorize data in memory before storing data in HDFS. This will make it easier to run Hadoop jobs on the data collected, and also have end clients respond to the critical events in real time.

CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant

Adam Kocoloski, Co­Founder & CTO of Cloudant, CouchDB Expert
Cloudant operates database clusters comprising 100+ nodes based on BigCouch, the company’s fork of CouchDB. Key elements of CouchDB’s design have proven instrumental to success at this scale, including version histories, append-­only storage, and multi-­master replication. In this talk, Cloudant Co­Founder and Apache CouchDB Committer Adam Kocoloski will discuss lessons learned from running production CouchDB clusters bigger than many well­publicized Hadoop deployments, and how Cloudant’s experience at scale is informing development work on the next release of Apache CouchDB.

A New Data Architecture for the App Economy

Anant Jhingran, VP of Products at Apigee
It has been clear for quite some time that traditional warehouses do not cut it for unstructured and semi­structured data, and therefore new systems such as NoSQL and Hadoop have emerged. But these systems throw the baby out with the bathwater. Traditional warehouses were built on the premise that applications can be simpler because the databases did a lot. Of course, the penalty for this was that the application’s world view had to fit the relational, database world view. In the new Big Data system, the primitives have been lowered so much (simple key value pair, or completely unstructured tuple structure), that the applications now have to do a lot more. We argue that there is a happy medium. We have studied the kinds of data that sits in the app economy, and the data structures that need to be built on top of NoSQL and Hadoop that considerably speed up Insights in the app economy without requiring every problem to be coded from scratch.

Legacy Analysis: How Hadoop Streaming Enables Software Reuse – A Genomics Case Study

Jeff Melching, Big Data Engineer and Architect at Monsanto
The bioinformatics domain and in particular computational genomics has always had the problem of computing analytics against very large data sets. Traditionally, these analytics have leveraged grid and compute farm technologies. Additionally, the analytics software and algorithms have been built up over the past 30 years by contributions from both the public and private domain and written in a number of programming languages. When these software packages are brought in house and combined with the skills and preferences of internal bioinformatics researchers, what you get is a myriad of different technologies linked together in an analytics pipeline. The rise of technologies like MapReduce in hadoop have made the execution of such pipelines much more efficient, but what about all those analytic pipelines I have built up over the years that aren’t written in MapReduce? Do I have to rewrite them? Do I have to know java? This talk will explain how hadoop streaming can help you reuse instead of rewriting. It will also touch on techniques for packaging and deploying hadoop applications without having to centrally manage software versions on the cluster.

Enterprise Workflow Management Using Oozie @ Riot Games

Matthew Goeke, Big Data engineer at Riot Games
The massive push for big data across multiple industries can leave companies new to the Hadoop ecosystem looking for ways to integrate their existing enterprise technology into this evolving space. Automated workflows, especially traditional ETL pipelines, can be a daunting task when you consider that many enterprise workflow frameworks don’t natively integrate with Hadoop. That’s where the Oozie engine comes in. In this presentation we’ll address the utility, management features and lessons we learned from integrating Oozie into the Riot Games’ Big Data pipeline. We’ll also cover our implementation of Oozie in both the physical datacenter and in the Amazon cloud.

Analytics using Apache Hive with the power of windowing & table functions: Use Cases

Murtaza Doctor, Principal Architect at RichRelevance
RichRelevance serves 10 of the top 20 largest retailers in the world and has delivered more than $5.5 billion in attributable sales to date. Every 21 milliseconds a shopper clicks on a recommendation delivered by the company and the company serves over 850 million personalized shopping experiences daily. Its Hadoop infrastructure has a capacity to handle upwards of 1.5+ PB of data and ingest/stream GB’s of online clickstream data daily from retail websites on to its backend Hadoop cluster.

Not surprisingly, clickstream analytics is a critical aspect of RichRelevance’s product offerings for which it utilizes Apache Hive for analytics. However, while Hive Query Language (HQL) is excellent for productivity and enables reuse of SQL skills, RichRelevance’s Data Scientists and Analysts saw a huge gap in lack of support for Windowing and Table driven functions, which are necessary to deliver clickstream analytics on its session-driven and event-based data. To fill the gap RichRelevance was forced to employ additional tools like Pig, MapReduce, R and in some cases, even relational databases like Postgres.

This created a ton of extra baggage, until RichRelevance was introduced to SQLWindowing for Hive (SQW) framework, which allows the company to do frictionless analytics with just Hive, adding agility and simplicity to the entire process.

In this presentation, RichRelevance will present five advanced clickstream analytical uses-cases which are solved with the help of SQW framework, including analytics around Co-Occurrence of purchases and views popularly known as Market Basket Analysis, Pathing queries, Lag/Lead analysis, Landing & Exist pages within a session and purchase normalization. The presentation will also highlight the before & after impact of having this framework in practice and how it has made RichRelevance more agile.

The goal of this session is to showcase the power of the SQW and the community benefits from the frictionless and powerful analytical capabilities.

SPEAKERS

We’ve got an incredible speaker line up for StampedeCon 2013!

  • David Strom, President – David Strom, Inc.
  • John Lucker, Partner and Principal at Deloitte Consulting
  • Paul Doherty, President and CEO of the digit group, inc.
  • Anthony Martin, Chief Privacy and Information Security Counsel – Walmart
  • Eric Kavanagh, Host of DM Radio for Information Management
  • Bill Shannon, Professor of Biostatistics in Medicine, Washington University, and Founder and President, BioRankings, LLC
  • Bruno Kurtic, Founding VP of Product Management and Strategy at Sumo Logic
  • Kilian Weinberger, Assistant Professor of Machine Learning (Department of Computer Science), Washington University
  • Barry Livingston, Director of Engineering at Riot Games (substituting for Jerome Boulon)
  • Ryan Brush, Distinguished Engineer with Cerner Corporation
  • Vinod Vydier, Middleware Specialist at Oracle
  • Adam Kocoloski, Co­Founder & CTO of Cloudant, CouchDB Expert
  • Anant Jhingran, VP of Products at Apigee
  • Jeff Melching, Big Data Engineer and Architect at Monsanto
  • Matthew Goeke, Big Data engineer at Riot Games
  • Shrikanth Shankar, Head of Engineering at Qubole
  • Murtaza Doctor, Principal Architect at RichRelevance
  • Justin Sheppard, IT Director with Sears Holdings and Head of Business Operations for MetaScale

David StromDavid Strom, President – David Strom, Inc.

David Strom is one of the leading experts on network and Internet technologies and has written and spoken extensively on topics such as VOIP, convergence, email, cloud computing, network management, Internet applications, wireless and Web services for more than 25 years. He is also the creator of an innovative series of video screencast product reviews of enterprise IT products that can be found on Webinformant.tv and syndicated to various other Web sites.

He has had several editorial management positions for both print and online properties in the enthusiast, gaming, IT, network, channel, and electronics industries, including the editor-in-chief of Network Computing print, Digital Landing.com, and Tom’s Hardware.com. He currently writes for GigaOM Pro, ITWorld, Network World, Techtarget, Dice and Slashdot, among others.


John Lucker, Partner and Principal at Deloitte Consulting
John Lucker is a principal and the Global Advanced Analytics and Modeling Market Offering Leader in Deloitte Consulting LLP. He is also a U.S. leader in Deloitte Touche Tohmatsu Limited’s Deloitte Analytics Institute. Jon has written and presented internationally on the topics of Big Data, Analytics, and Data Science.


Paul DohertyPresident and CEO of the digit group, inc.
Paul is the President and CEO of the digit group, inc., a global leader of cloud based solutions for the built environment. He is one of the global real estate industry’s most sought after thought leaders, strategists and integrators of process, technology and business. His experience as an author, educator, analyst and advisor to Fortune 500 organizations, global government agencies, prominent institutions and the most prestigious architectural, engineering and contracting firms in the world provides the digit group with world class leadership. He is a prominent and highly rated speaker at numerous industry events around the world each year and has been appointed as a guest lecturer at leading universities throughout the world. A former Board of Director of the International Facility Management Association (IFMA), Paul is a co­founder of IFMA Shanghai, the first Western professional industry association in the People’s Republic of China and to co­founder of IFMA’s Building Information Modeling Lifecycle Operations Community of Practice. He is currently writing his third book, called “From Smart Buildings to Smart Cities”.


Anthony Martin, Chief Privacy and Information Security Counsel – Walmart
Chief Privacy and Information Security Counsel – Walmart, Former start­up founder, law professor, and tall­building lawyer.


Eric KavanaghHost of DM Radio for Information Management
Eric Kavanagh is a career media professional, with more than two decades of experience in print, broadcast, and New Media. He has interviewed such influential world figures as Michael Jordan and the late Captain Jacques Cousteau. His editorial series in 2005 helped to inspire the Federal Funding Accountability and Transparency Act of 2006, in particular his evangelist role in promoting Citizen Auditors. Currently, he moderates Webcasts for several prominent organizations, including The Bloor Group, SourceMedia and the Global Association of Risk Professionals.


Bill Shannon,  Professor of Biostatistics in Medicine, Washington University, and Founder and President, BioRankings, LLC
Dr. Shannon is a professional who is passionate about using data to solve real world problems in science and business. By working closely with clients he learns what problems they are working on, and then develops an analytical strategy to extract information from their data to help solve them. He is particularly skilled at developing innovative statistical methods to solve difficult and unique problems.
Bill Shannon, PhD, MBA, is a tenured Professor of Biostatistics in Medicine at Washington University School of Medicine, and Founder and President of BioRankings LLC, a biostatistical firm providing innovative statistical solutions to big data challenges. Dr. Shannon has over 25 years of applied biostatistical expertise in biomedical, pharmaceutical, and biotech research with about 120+ academic publications of his lab’s work in statistics and applied biomedical areas such as oncology, genetics, cardiology, pulmonology infectious disease, sleep medicine, and health care delivery.


Bruno KurticFounding VP of Product Management and Strategy at Sumo Logic
Bruno joined Sumo Logic from SenSage, where he was the Vice President of Product Management. Before joining SenSage, Bruno was with the Boston Consulting Group (BCG) where he developed and implemented growth strategies for large high-tech clients. Prior to BCG, he spent six years at webMethods where he was a Product Group Director for two product lines. At webMethods he started the west coast engineering team and played a key role in the acquisition of Active Software. He was also with Andersen Consulting’s Center for Strategic Technology in Palo Alto and founded a software company that developed handwriting and voice recognition software. Bruno holds an undergraduate degree in Quantitative Methods and Computer Science from University of Saint Thomas and an MBA from Massachusetts Institute of Technology (MIT).


Kilian WeinbergerAssistant Professor of Machine Learning (Department of Computer Science), Washington University
Kilian Q. Weinberger is an Assistant Professor in the Department of Computer Science & Engineering at Washington University in St. Louis. He received his Ph.D. from the University of Pennsylvania in Machine Learning under the supervision of Lawrence Saul. Prior to this, he obtained his undergraduate degree in Mathematics and Computer Science at the University of Oxford. During his career he has won several best paper awards at ICML, CVPR and AISTATS. In 2011 he was awarded the AAAI senior program chair award and in 2012 he received the NSF CAREER award. Kilian Weinberger’s research is in Machine Learning and its applications. In particular, he focuses on high dimensional data analysis, metric learning, machine learned web-search ranking, transfer- and multi-task learning as well as bio medical applications. Prior to joining Washington University, he was a member of the Yahoo! Research Lab in Santa Clara, where he actively worked on the web search engine, the email spam filter and various other large scale machine learning algorithms.


Ryan BrushDistinguished Engineer with Cerner Corporation
Ryan is a Distinguished Engineer with Cerner Corporation, one of the leading healthcare technology companies worldwide. He has built infrastructure for healthcare systems over the past decade, and currently is leading the design of Cerner’s big data infrastructure. Ryan has spoken at Hadoop World + Strata New York, ApacheCon, and also has dabbled in writing, contributing to the book 97 Things Every Programmer Should Know.


Vinod Vydier, Middleware Specialist at Oracle
I have been working with different kinds of middleware technologies for the last 20 years ­ RPC, CORBA, Tuxedo, Tibco and JEE technologies. I worked with BEA during its early days and was with a SaaS start up (castiron.com) doing salesforce.com and Oracle CRM on demand integrations, before joining Oracle. I have worked with Oracle’s BigData solutions, Cloudera’s Hadoop distribution and in­memory processing engines for some time now.


Adam KocoloskiCo­Founder & CTO of Cloudant, CouchDB Expert
Adam is an Apache CouchDB developer and one of the founders of Cloudant. He is the lead architect of a Dynamo­flavored clustering solution for CouchDB that serves as the core of Cloudant’s distributed data hosting platform. Adam received his Ph.D. in Physics from MIT in 2010, where he studied the gluon’s contribution to the spin structure of the proton using a motley mix of server farms running Platform LSF, SGE, and Condor. He and his wife Hillary are the proud parents of two beautiful girls.


Anant Jhingran,  VP of Products at Apigee
Dr. Anant Jhingran (Ph.D Berkeley) joined Apigee from IBM where he was VP and CTO for IBM’s Information Management Division and Co­Chair of IBM wide Cloud Computing Architecture Board. He was responsible for the technical strategy for databases, information integration, analytics, Big Data, and helped deliver IBM’s PaaS capabilities. Anant has received several awards including IBM Fellow, IIT Delhi Distinguished Alumnus Award, President’s Gold Medal at IIT Delhi, IBM Academy of Technology, and has authored over a dozen patents and over 20 technical papers.


Jeff Melching,  Big Data Engineer and Architect at Monsanto
Jeff Melching is currently a Big Data Engineer and Architect at Monsanto, an agricultural company focused on producing more while using less. He has nearly 15 years of experience in IT ranging from working on some of the first IP telephony solutions at Bell Labs, to his current interest areas around leveraging platforms like hadoop, solr, and storm to increase analytic efficiency and answer questions that were never before possible in genomics. He holds an M.S. in Telecomunications and Computer Science from Depaul University in Chicago, IL and enjoys spending time with his wife and kids.(Especially at the beach with beer in hand!)


Matthew Goeke, Big Data engineer at Riot Games
Matt Goeke works on the Big Data engineering team at Riot Games, where he’s responsible for maintaining the enterprise data warehouse and data collection pipelines. Since working on these pipelines involves splitting his time between physical datacenter clusters and cloud computing resources in Amazon Web Services, there’s an ongoing debate over whether he’s more machine than man. Matt also works on improving workflows in the traditional ETL space and increasing the accessibility of non­relational data to internal teams. Prior to Riot, Matt helped build Monsanto’s first Hadoop cluster and compute pipeline using MR and HBase.


Shrikanth Shankar
Before coming to Qubole, Shrikanth Shankar worked at Oracle for over a decade, rising to become Director of Development in the BI team. Shrikanth was one of the leaders of the Oracle Exalytics effort and helped drive the product from conception to release. Before that, Shrikanth was on the Database team in the SQL/DSS group, where he made significant contributions to many different portions of the Oracle stack, ranging from Partitioning, SQL Optimization, and SQL/Parallel Execution all the way to the Indexing and Data layers. Now Shrikanth is the Head of Engineering for Qubole, a pioneering startup in Big Data.


Murtaza Doctor, Principal Architect at RichRelevance
Murtaza Doctor is Principal Architect at RichRelevance, and brings over 12 years of product building and technology leadership experience to the role. As part of RichRelevance, Murtaza has designed and built products from the ground-up in advertising, e-commerce, pricing, and defense domains. Prior to RichRelevance, he worked at Yahoo! on its Display Advertising platform team, where he architected and built a next-gen inventory management system, as well as led various supply optimization initiatives. He also served as adjunct faculty at the University of Houston Clear Lake, where he taught undergraduate courses in Computer Science. Murtaza has his masters in Computer Science from the University of Houston Clear Lake, and bachelor’s degree from Sardar Patel College of Engineering in Mumbai, India.


Justin Sheppard
Justin is an IT Director with Sears Holdings and Head of Business Operations for MetaScale, a big data technology subsidiary of Sears. Justin is leading Sears Holdings efforts to harness the power of Hadoop and other open-source technologies to deliver business value from data. Prior to joining Sears, Justin was with Deloitte Consulting for 12 years serving Fortune 500 clients. Justin received his MBA from the University of Chicago.

SPONSORS

2013 Leadership Sponsors

IBM
MapR Technologies
Oracle

 

2013 Platinum Sponsors

Cloudera
EMC Isilon
XIOLINK
Daugherty Business Solutions

 

2013 Gold Sponsors

Washington University in St. Louis | Center for the Application of Information Technology

 

2013 Silver Sponsors

Hortonworks
Skytree

 

2013 Training Partners

Cloudera
Inferology

 

2013 Platinum Media Sponsors

Datanami
HPC Wire