I can’t believe all that has happened in the 7 months since I last blogged about high-performance analytics, so I’m back to give you some updates from SAS R&D. The energy around this area keeps growing as more of our developers adopt the new high-performance architecture and target their developments to take advantage of massively parallel computing environments. As we conquer these new areas, we are seeing more “aha” moments as customers in many industries recognize the game-changing nature of these performance breakthroughs.
One exciting breakthrough is the new SAS LASR Analytic server. The LASR Analytic Server can handle the computational complexity of large-scale exploratory data analysis and visualization in response to dynamic analytic queries. It delivers results from regressions, percentiles, correlations, cross-tabs, and many other analytical computations at lightning speed. We also just released Visual Analytics Explorer, which leverages the LASR Analytic Server to visualize billions of rows of data with split second response times.
We continue to add to the list of high-performance analytics procedures that leverage in-memory computations for tremendous performance gains. These procedures encapsulate complex analytical algorithms as loadable extensions that can run alongside the database to solve complex business problems on massive amounts of data. We prioritize development on the algorithms to implement based on customer input. Customers help us identify business problems where the current architecture is not capable of solving the problem due to the size of the data or the complexity of the problem. In all these instances, either the massive data sizes require distributed storage or the computational complexity benefits from distributed computing.
In our first high-performance analytics release we introduced several statistical and data mining procedures. Soon, we will add several more procedures, including some for optimization and text analytics. Keep reading for more about our work in each of those areas.
High-performance analytics for text mining
One text analytics offering will apply high performance computing techniques to vast quantities of unstructured text data, which can be very expensive to analyze, since even relatively small data sets can swamp the memory of most computers.
Since experts estimate that more than 80 percent of today’s data is unstructured text from sources such as call centers, social media, scientific literature, and more, incorporating insight from these data can improve your analytical models significantly. Using this new high-performance paradigm, customers can parse millions of documents, index the results, and categorize them into meaningful clusters and segments that can be added to build a predictive model.
This area is particularly exciting for me, as it combines expertise from three different analytical areas: text parsing, predictive modeling and optimization, and I passionately believe that solving the hardest problems calls for a multidisciplinary approach. I encourage our teams to work backwards from business problems that need solved and work together to build solutions combining approaches to address them.
What are some applications of high performance text mining? A manufacturer could build predictive models from product and parts catalogs, which contain millions of items with full text descriptions. Those models could be used to match new customer service requests with more appropriate information or to prioritize ordering needs.
In another example, a large financial firm with millions of customer accounts and thousands of products could merge their customer call center transcripts at the transaction level with demographic and financial data in order to improve lift on marketing campaigns.
In both cases, this ability to combine the results of text mining with structured data in a high-performance environment can improve the performance of prediction and discovery models. Combining text with numeric data offers exciting improvements to models that previously used numeric data alone.
High-performance marketing optimization
Marketing optimization is clearly an area that benefits from exploiting the high performance architecture as the problem sizes can explode with the size of the campaign population or the number of complex business constraints that need to be applied.
It is exciting to see some of the early results from our work in this area. In some of our benchmarks with real customer data with hundreds of marketing offers and millions of customers, high performance marketing optimization allows analysts to run optimization computations using customer data in minutes instead of hours. For instance, in one scenario the optimization time for a campaign with 48 million customers and 40 communications reduced from 6 hours and 50 minutes to a little over 3 minutes.
This performance improvement allows for quick scenario analysis for problems that previously had to run overnight. As a result, analysts can explore and evaluate available business choices and other financial tradeoffs rapidly and more thoroughly, leading to a more robust and nimble decision support system. The same analyst could also combine more campaigns together to optimize across a broader set of constraints instead of performing local optimizations.
With such immense performance improvements and scalability results coming to fruition in all these problem areas, it is no wonder that the level of excitement continues to grow in the corridors of R&D at SAS. It is indeed a wonderful time to be part of this super-charged environment!