I’ve had several meetings lately on data management, and especially integration, where the ability to explore alternatives has been critical. And the findings from our internet of things (IoT) early adopters survey confirms that the ecosystem nature of data sources in IoT deployments means we need to expand the traditional toolbox for data integration.
For many customers, facilitating better analytics with data integration means building ever larger data warehouses. But this takes time – and ‘time is money.’ If it takes too long to get the required insights, business users will just give up and go with their gut instinct.
One alternative to larger data warehouses is virtualising the data. This technique simplifies increasingly complex data architectures. It’s an approach that provides in-memory caching and a centralised area to monitor and create queries, without having to move or copy the underlying data.
The data management perspective
There are several problems that can be overcome with this virtualisation approach:
Quality: The data is not being moved, so there’s no risk of errors that typically arise during such transfers. Analysts can draw on several different sources of data, which is likely to give them better information and improved insights – all while the original data remains with its custodians.
Currency: Data virtualisation allows real-time use even of frequently-updated data because you can take the ‘current’ data at any given point, even if it's in the process of being updated. In other words, you can have real-time access to data without damaging it, or putting pressure on the source.
Legacy must be taken into account at every stage – why do we have legacy data? What value are we getting from it now, and what value could we get in the future?
Security: It’s usually hard to get business users interested in data security. Through simpler, easier rules and governance structures, control of data and compliance with legal requirements are more straightforward.
But, performance can be an issue. It's a case of finding the right balance between a drop-off in performance and all of the above benefits.
With companies beginning to have hybrid environments, it’s becoming increasingly important for teams to be able to work with a blend of open source and other technologies. The virtualisation approach helps prevent projects getting stuck at the test phase, and makes it possible to execute them out to a wider audience.
The self-service imperative
Virtualisation doesn’t just address data management efficiencies; it also begins to address the growing need for self-service among analysts and business users. Federating data through transparent and well-documented taxonomies and directories allows new insights to be uncovered – sometimes from previously unimagined data links.
As IoT begins to stream a larger volume and bigger variety of data, designing for self-service will become a strategic consideration. Especially as there's growing understanding that, for most companies, the value in big data is bringing together multiple data sources, many previously ignored, and generating insights from this integration. Data virtualisation is perfect for this.
So, the next time you aim for traditional data integration, ask yourself if virtualisation might be a better option.
The long view
Innovation is key to success for our customers, and we’ve applied that same philosophy to develop the new SAS Viya platform – an open platform built for analytics. It's already attracting companies keen to be early adopters.
What do you think of the challenges outlined above? Let me know in the comments below.