The journey to high-performance analytics

4

Rome was not built in a day.  Similarly, high-performance analytics is a product of many cumulative architectural, computational and analytical advances. The ability to solve complex business problems by applying algorithms from multiple disciplines to increasingly large volumes of data of all types - both structured and unstructured - is not a small task. It requires innovation at many different levels and is the result of several years of effort built on customer experience and a strong history of building robust enterprise class products.

The ability to deliver high-performance analytics  requires the expertise of many engineers and scientists across computational as well as analytical disciplines and is one of the most exciting efforts that we have undertaken in my nearly three decades at SAS. What makes it an exciting time for me? I will share a few perspectives. 

For SAS Institute, working with data and exploiting algorithms has always been the bread and butter of what we do. It is built into the DNA of the SAS culture. For more than 35 years we have worked on innovative problems, and the size of data or the complexity of the analysis has never been a barrier. Every few years we have tackled new challenges. Significant milestones in this journey have included:

  • Creating software that runs on multiple operating systems
  • Accessing data from many different sources.
  • Taking advantage of distributed computing environments.
  • Creating targeted, analytically based vertical applications which take advantage of our analytical strengths.
  • Creating analytical applications which facilitate collaboration across the enterprise.

In every one of these areas, the requirement to tackle larger problems and more complex scenarios continues to grow, which in turn requires our core analytical tools to continue to grow and be re-invented to satisfy these demands. At every level of the software, we need to take advantage of multiple processors on a single machine as well as run on multiple nodes in a distributed environment.

These years of experience have taught us that the big data / big analytics problem must be addressed at many different levels. Our vision for high-performance analytics includes all aspects of BIG – volume, velocity, variety and complexity. More importantly, when you think of high-performance analytics, you need to ensure it goes hand-in-hand with master data management and data governance as well. The performance of an analytical algorithm alone is not sufficient to solve the entire business pain for an enterprise; you need to pay attention to performance for all aspects of data movement as well.

SAS high-performance analytics offerings are designed to provide fast execution and minimize data movement for both model creation and deployment. For example, the SAS Scoring Accelerator and SAS Analytics Accelerator provide scoring and modeling inside the database. Catalina Marketing has seen reductions in model-scoring time from 4.5 hours to around 60 seconds by taking advantage of this technology.

As SAS continues on this journey, the next frontier is exploiting the massively parallel capabilities of the database. This will facilitate manipulating and loading large quantities of data while providing complex analytical algorithms that can be encapsulated as loadable extensions and run alongside the database. Taking advantage of the parallel capabilities of the database and moving the analytics to the data will unleash the power of the mathematics on massive amounts of data. It allows us to provide a high-end, enterprise class platform that combines in-memory analytics with a data platform that supports hardware failover and  data replication, terabytes of storage, querying capabilities, ETL, etc. – all of which are important to IT and to our customers.

Our architectural breakthroughs make it possible for analytical developers to re-structure their algorithms to exploit hardware advances and run in multiple distributed modes. This, I believe, is a key milestone for our overall development paradigm which immediately opens up a vast array of possibilities for us.

Some of the current work ranges from running single analytical routines such as logistic regression to enabling the full data mining modeling process in a high-performance environment to order of magnitude performance enhancements for targeted business applications like markdown optimization and marketing optimization.

For example, some of the game-changing performance results we can now realize include:

  • Fitting a logistic regression model on a billion observations in about a minute.
  • Solving a large variable selection problem with over 1800 parameters in upwards of 100 model effects and 50 million observations in under a minute.
  • Solving problems that were previously intractable. For example, a marketing optimization problem based on more than 25 million customers and nearly 1000 offers, we have seen the solution time drop from over five and a half hours to less than six minutes.
  • Exploiting the new architecture to increase performance improvements for products that are already market leaders in performance, including SAS Marketing Optimization as mentioned above.

What does high-performance analytics mean for your business? It has the potential to introduce some game changing options for customers. For instance, Macy’s needs to determine optimal clearance prices for over 273 million product-by-location combinations involving hundreds of millions of potential pricing decisions per week. The SAS Markdown Optimization solution analyzes three terabytes of historical sales data with multiple estimation and pricing algorithms targeted for this particular business problem. Using new SAS high-performance analytics technologies, the computation time was reduced from 30 hours down to about 2 hours. This immense reduction in time gives the customer an opportunity to run more scenarios in the same window of time, providing the ability to look at alternate pricing strategies, thus allowing Macy’s to provide the right prices to the right customers at the right time, in the end maximizing profit and clearing inventory.

What does this mean for our software products? We will continue to see tremendous growth in the area of high-performance analytics. We will continue to move an increasing number of our algorithms into the high-performance paradigm to exploit hardware advances and harness the maximum performance gains from them. The infrastructure provides the ability to run on a variety of hardware configurations – commodity as well as platforms where we can exploit the benefits provided the hardware vendor. The overall effort across SAS enables us to handle many types of big problems - dealing with the big data issue as well as with complex problems which historically have taken days to run even on small data.

How do I feel about where we are? Energized, excited and passionate about what we can do!

Share

About Author

Radhika Kulkarni

Vice President, Advanced Analytics R&D

Dr. Radhika Kulkarni is Vice President of Advanced Analytics R&D at SAS Institute Inc. Radhika oversees software development in many analytical areas including Statistics, Operations Research, Econometrics, Forecasting and Data Mining. In her role as an OR expert at SAS Institute, Kulkarni was influential in driving the recognition of Operations Research as a key component of business analytics solutions in several areas including Finance, Retail, Marketing, Hospitality and Supply Chain. Kulkarni is a Member of the Board of Directors for IDeaS, a SAS Company. She is an active member in the Institute for Operations Research and Management Science (INFORMS) and serves on the Advisory Board of the Institute for Advanced Analytics at North Carolina State University and The Center for Hospitality Research at Cornell University. Radhika Kulkarni has a Master’s in Mathematics from the Indian Institute of Technology, New Delhi and a Master’s and Ph.D in Operations Research from Cornell University.

Back to Top