The need for fast and easy access to high-powered analytics has never been greater than it is today. Fortunately, cloud processing still holds the promise of making analytics more transparent and ubiquitous than ever before. Yet, a significant number of challenges still exist that prevent more widespread adoption of cloud analytics.
Broadly speaking, most modern cloud deployments don’t suffer from a lack of hardware resource availability but rather are compromised due to poor software architecture and design. As with in-memory distributed computing, software has to be written specifically to take advantage of the way that cloud systems need to work, otherwise gains in productivity and cost reductions often will fail to materialize. Many cloud analytics adopters have found themselves locked into poorly functioning cloud environments because they didn’t ask the right questions up front related to software architecture and associated dependencies.
The most important issues that cloud analytics processing systems must address are:
- Guaranteeing security.
- Optimizing work throughput through the support of different processing paradigms.
- Ensuring high availability in spite of required maintenance.
- Allowing tracking and charge-back of individual units of work.
- Transparency around the total cost of ownership (TCO) and what are often concerned as ‘hidden’ costs.
Indirectly these challenges speak to the maturity of any software or application system and really reflect the amount of design effort that has been implemented by a specific vendor. More directly, any data processing system, analytics enabled or not, that does not support these five capabilities fails to exhibit robust cloud resiliency and exposes potential liabilities in terms of long-term adoption. Let’s look at each in more detail – and what to watch for when addressing them to ensure the success of your cloud analytics effort.
Perhaps the biggest issue on most cloud users’ minds is security. Since cloud supporters don’t want private information to be exposed or stolen, cloud-based software must support built-in flexibilities that allow it to work easily with a multitude of popular security tools. Software vendors need to be adept at identifying popular and emerging trends and design their software to work among a variety of different proven security protocols.
The next big obstacle to implementing a well-functioning cloud analytics application is to have software that can run on practically any operating system or hardware configuration; this is where the concept of dynamic throughput becomes important. Ideally, analytics software should not be confined to a specific processing strategy but rather have built-in “smart” capabilities that allow it to distinguish between different processing scenarios (topologies) and dynamically choose how the analytics are executed without “fork-lifting,” or moving the data to the analytics. This is not a trivial task. It means that the software has to be literally self-aware of what resources it has at its disposal, maybe choosing between highly distributed MPP in-memory networks, in-stream processing, a grid environment, a single-machine (SMP) instantiation, or even a slower single-threaded processor, depending on what’s available. The ability to switch between different processing paradigms, while at the same time scaling up or down in resources, and without requiring any user intervention or code modification, is key to having modern sustainable cloud analytics application systems in the future.
The next major challenge for cloud systems involves the issue of maintainability and governance. IT departments are legitimately concerned about their business users installing software they don’t have any control over and cannot support. Add to these worries the fact that most mission-critical apps need nearly 100 percent uptime. From an availability perspective, it is no longer acceptable to shut down a server in order to replace or upgrade it. Ideally, redundancies need to be built into the software so that duplication of control and processing exists, making maintenance easier.
Charge-back is the fourth roadblock on the “cloud minders” list. Generally there exists a consensus that costs need to be generated on basic units of work in order that resource usage can be recovered and/or assigned to different business units. If the software doesn’t allow for the possibility of cost assignment to the most basic units of work (usually associated with individual processes), then the cloud environment cannot support a la carte pay-as-you-go pricing.
The final hurdle for cloud analytics is being able to assess total ownership costs. A lot of on-line analytics vendors provide services that appear to be cheap at the beginning until you actually want to use your results. And don’t think about making any mistakes where you might have to repeat the process. What is promoted as low cost on the front-end actually represents the accumulated costs for data storage, database access, transfer of data (bandwidth), memory allocations, numbers of users, row-level scoring, and a variety of other tasks and resources used (like consulting and IT support). In order to assess profitability, the entire analytics lifecycle, including deployment costs, actually needs to be quantified and assessed. Cloud analytics needs to be able to compartmentalize each of these costs up-front so there are no “surprises” later on. This will allow potential cloud users need to know if there use of the technology is actually cheaper than using the software on a network or on a server.
The dawn of cloud analytics computing is still just beginning. Vendors are struggling with the challenges listed above and how to architect their software to accommodate the vision and needs of a true cloud environment. The good news is that SAS has a vision for the future that will meet and exceed all of these requirements. Learn more about cloud analytics from SAS.