Machine learning dominates Strata 2014 in Santa Clara

1
SAS booth at Strata conference
The SAS booth at Strata

Data asset management and analytic processing associated with big data were main topics of interest at the recent Strata conference in Santa Clara, California.  Hosted in Silicon Valley, the conference attracted some of the brightest and most intelligent data scientists from America’s top research and academic institutions.  Yet, to my ears, the real buzz was around the terms “machine learning” and “deep analytics,” meaning that the conference attendees seemed keenly interested in learning more about how to run predictive analytics on their large collections of data.

SAS debuted new statistical modelling capabilities for Hadoop at the conference.  This offering, SAS In-Memory Statistics for Hadoop, is targeted specifically at the data science community. It offers a familiar environment for interactive programming, backed by the advanced statistical algorithms of SAS.  Wayne Thompson, SAS’ Chief Data Scientist,  gave a great presentation and demo, and other staff members had a busy time in the demo center discussing it and other SAS offerings for Hadoop.

From my own conversations, I talked with many people who have been trying different versions of open-source languages like R, Mahout and iPython to run advanced predictive models on big data.  If you're working with these open source solutions, the SAS environment can be integrated as an ideal platform for driving the entire analytical lifecycle from data preparation, discover, modeling, and deployment.  Our offerings are able to integrate with open source tools to extend existing capabilities. For example, SAS just released a new node in Enterprise Miner (our flagship data mining product) which is specifically designed to incorporate R models within a competitive model tournament scenario.  For any model that wins the tournament, SAS generates score code (even for R models), making model deployment a lot faster and easier.

There was a lot of buzz around iPython and Scikit-learn capabilities to run machine learning algorithms against large data sets.  A few iPython start-ups were showcasing their in-memory benchmarks against R batch-processing algorithms, and it appears that some of the new routines are 7 to 10 times faster than what traditional R programs can achieve.  From an open-source technology perspective, I predict the industry is in store for another seismic shift in coding preferences toward iPython because of these efficiencies.

SAS has lightning-fast, in-memory technology for Hadoop, and continues to provide opportunities for side-by-side model comparisons for open source algorithms.  But, SAS is also committed to being the leader in developing new machine learning algorithms that set the industry standards. For example, SAS is the only software vendor that has a fully implementable random forest algorithm that currently captures an unlimited number of splits for a forest of up to 10,000 trees - a deployment capability that is essential for real-time scoring. With our commitment to providing an open architecture that other external interfaces can easily access, I believe SAS is well-positioned for the new era of in-memory computing in which we find ourselves.  Exciting times are ahead!

Photo credit: O'Reilly Conferences 

Tags
Share

About Author

Phil Weiss

Analytics Systems Manager

Phil Weiss is an Advisory Solution Architect in Sales and Marketing Support. He was an accomplished application developer and statistical consultant for 10 years before joining SAS. His technical specialties are time series forecasting, high-performance analytics and distributed processing systems. Phil has written a book on the history of Lake Tahoe’s oldest licensed casino, the Cal-Neva Resort, a place once owned by Frank Sinatra.

1 Comment

Back to Top