Starting in 2007, according to IDC, the amount of data captured and replicated worldwide outgrew our total available storage capacity. Total data captured that year equaled 281 exabytes and storage capacity equaled 264 exabytes. These numbers - and that gap - have been growing exponentially ever since.
And that's just one of the many facts that kind of blows your mind about this "big data" world. The challenges facing analysts are summarized by the words, "More, more, more. bigger, bigger, bigger," says Oliver Schabenberger, lead architect of SAS High Performance Analytics.
Oliver presented the opening keynote about high performance analytics (HPA) at the Analytics 2011 conference in Orlando this morning with Radhika Kulkarni, Vice President of Advanced Analytics R&D at SAS. For background, read Radhika's previous post on this blog, The journey to high-performance analytics.
Radhika begins the presentation by sympathizing with the challenges analysts face today: The world is more connected. Data has exploded. Problem complexity has increased. The demand grew for real-time anything. Performance expectations are growing. Additionally, there are more data sources, more unstructured data, more attributes and more columns.
The good news is that hardware has grown by leaps and bounds, and SAS is developing analytical methods that take advantage of multi-core hardware environments.
For example, Oliver mentions Catalina Marketing, the company that provides targeted coupons for you at the grocery store checkout lane. How do they do that? They use SAS high performance analytics on 2.5 petabytes of consumer data to provide the right coupon for every customer at thousands of different grocery stores.
"You should not be doing less sophisticated analysis just because you have more data," says Oliver. "If the size of the data is choking your analytics, the problem is not that you have too much data. It's that you don't have the right analytics environment."
So what does the right environment look like? And where do performance gains come from? Concurrent execution of analytic processes that take advantage of dual core and memory resources.
The No. 1 performance killer, according to Oliver is the lack of ability to move data around. So, the No. 1 strategy for SAS in high-performance analytics is now co-location: essentially, bringing your data and analytics together.
How you approach that problem technically? In the past, customers have used - and can still use - grid computing and in-database analytics as high-performance analytics models. The latest, fastest option is called high-performance analytics, which Oliver describes as, "side-by-side analytics," meaning the analytical procedures run alongside the database processes.
"SAS in-database analytics sends jobs through the database to execute inside the database process. In the new model, we're running side-by-side with the database process," explains Oliver. "They like each other and they talk to each other but now SAS processes are not limited to the abilities of the database computing environment."
SAS has partnered with Teradata and Greenplum to build high performance computing appliance. Using this hardware and the new SAS HPA algorithms, you operate SAS from your desktop or laptop, but all the work is done on the backend appliance, where a computational component breaks down complex algorithms and models into a series of calculations that execute in parallel.
"It's the ability to communicate our processes among the nodes that is the game changer," says Oliver. Our view of this is that you cannot just do compute without solving the data problem. We want to solve big data and big analytics jointly. The math processes run as peers of the database on the same hardware. We don't move data. We pass it."
What's the big idea of HPA? According to Oliver, "We're solving business problems that have performance issues in such a way that we only make them faster but also make transformational change, from hours to seconds and from minutes to seconds."
It's not only about running problems faster, but about running problems that were not solvable before, says Radhika. "The beauty of this paradigm is that it has opened up a wide array of possibilities and allowed us to take our existing algorithms and move them to this new architecture."
The key message, she says, is that, "We are going all in. We've been taking quite a lot of procedures and moving them into this architecture. All branches of math within SAS are refactoring their code for high performance analytics, and we're talking to customers to learn which techniques are of most importance to run on high-performance analytics."
What is your next step as a customer? "Bring us your challenging situations. Bring us the problems you thought were not solvable. That's what we revel in."