In evolutionary biology there is an interesting concept called “punctuated equilibria,” which is used primarily to explain why the fossil record seems to abruptly change after long periods of apparent calm, or ‘stasis’ as it is called. To understand this better, think of a stairway where the length between each step represents how long a species is able to survive before another life form adapts to take better advantage of available resources. The degree of change is represented by the height of the stair, namely the next step up from the older, pre-existing organism.
Interestingly, I believe this idea applies to data processing as well, where technological conditions can remain relatively constant for extended durations of time, only to be followed by sudden and dramatic change. A lot of examples abound, but for us baby boomers we can easily think back to a time before personal computers (PC’s) when most office work was done on a typewriter. Yet people had used typewriters for more than a hundred years prior to the shift to PC’s, which happened fairly quickly in the early 1980’s. Nowadays, I would challenge anyone to find a typewriter on which they could write a letter!
To that end, I believe that we are in the midst of another rapid change in technology that is explained well by the notion of punctuated equilibria. The new data processing revolution is called “distributed, (shared) in-memory processing,” or collectively referred to at SAS as high-performance analytics. While distributed computing and in-memory processing have been around separately as independent concepts for quite some time, the idea of combining them with a concomitant re-architecture of enabling code represents another giant leap forward in the computing world.
But what is “distributed, in-memory processing” and how does it differ from other processing models? Essentially the new processing paradigm is represented by the dissection of a single problem into many different subsets so that each computer has the ability to complete work on a small portion of the problem, while maintaining a two-way communication channel with all other computers working on different parts of the same problem. The ‘in-memory’ piece allows for faster consumption of the inputs because there is no costly delay in reading/acquiring data [which is where most of the time is spent versus doing calculations]. Additionally, there is an automatic step at the end where the individual disparate results are collected and assembled together for final interpretation and presentation.
To simplify the explanation further, think of a pie eating contest where one person is trying to finish a whole pie alone races against a group of pie eaters who devour smaller slices until the whole pie is consumed. By having more pie eaters and the right amount of pie sitting in front of each one of them, the group will finish the whole pie faster (as long as the overhead of slicing the pie and delivering the slices is low). And because each member of the group can communicate before and after eating, they can share information that might make their eating go faster (like maybe sharing tips on how to eat faster).
Grid computing was the first logical extension of using multiple groups of processors to solve a specific problem. In order to accomplish its work, the SAS Grid solution, itself a high-performance product, seeks to replicate the entire work problem on each computer instead of sub-sectioning it into smaller pieces. In our pie eating analogy, an optimized grid computing environment essentially attempts to bake smaller, but still whole pies. The pies can be reduced in size, but all have to be consumed separately and none of the eating group can communicate with one another before or after the task at hand. Additionally, at the end of the contest all the results have to be manually tabulated, since there is no automated results aggregation. Inevitably, there are more manual steps and some duplication involved in grid computing as compared to distributed, in-memory processing, but both are still orders of magnitude faster than having a single processor (even if it is multi-threaded) solve the entire problem.
Essentially what has been missing is a light-weight communication layer that could pass data processing instructions from a single origination point to many other data processing points, or nodes, that were connected in a vast network or array type of framework. SAS has now created a standardized communication layer and is currently well ahead of any competitor in this field.
This innovative work has created a foundation upon which the future of the company will be based, a point often articulated by the CEO, Dr. James Goodnight. As early as January 2008, Dr. Goodnight directed his development teams to begin re-designing certain SAS analytic procedures specifically to take advantage of server farms, as he correctly envisioned this hardware configuration (also referred to as ‘cloud computing’) as being the dominant form of future computing.
However, the magnitude of the problem faced by the R&D department was truly monumental. The difficulty was not simply achieving the first set of results, but rather implementing a conceptual methodology that all R&D teams could easily learn and leverage. Overcoming this initial obstacle laid the groundwork for integrating these speed improvements into many different SAS solutions.
As a result of our development breakthrough, these are exciting times at SAS! I am greatly enthused that SAS as a company has chosen to lead this pioneering effort and act as a true catalyst in the software industry. The positive impact on our customers in the form of efficiencies to some of their business cycles are just beginning to be felt, but there is already good evidence that a large portion of business processes will be radically transformed within the next decade because of this new technology.
So why originally bring up the idea of punctuated equilibria? Basically it’s the age-old story we have always told our customers: adapt or be displaced by the competition. Similar to what the scientific evolutionary model predicts, those businesses that find ways to better exploit their data through improved technologic resources will be better positioned to out-compete their less nimble counterparts and rivals. I truly believe that SAS high-performance analytics, and the foundational inventions on which it is based, are game-changing technologies, the magnitude of which may not have been seen since the first SAS program was written back in the late 1960’s.
You can hear more from Dr. Goodnight and other "big data" experts in this special 32-page report on high-performance analytics.