Big data, data standards and cross-platform integration


abstract big dataAt a recent TDWI conference, I was strolling the exhibition floor when I noticed an interesting phenomenon. A surprising percentage of the exhibiting vendors fell into one of two product categories. One group was selling cloud-based or hosted data warehousing and/or analytics services. The other group was selling data integration products.

Of course, when you think about it, this makes a lot of sense. The economics of cloud computing has shown benefits when using software-as-a-service products like Clearly, this paradigm significantly reduces the costs of developing and managing big data projects using tools like Hadoop without having to pop for purchasing the necessary hardware. But as data moves off-premise, it does not obviate the need for internal data accessibility for in-house reporting. That means being able to integrate data wherever the data lives.

Therein lies the problem: as organizational applications migrate to hosted environments, so does the data. And once that data is sitting in someone else’s environment, you begin to lose control over it. Think about this: When you access customer data sitting in an SaaS CRM product, you are not only bound to their internal data models, you're also constrained by their data accessibility methods. Most importantly, you're constrained by their semantics – what the data elements are, what their specifications and definitions are, and how those are interpreted by the application. Alternatively, consider a hosted Hadoop deployment where the data sets are managed as schema-on-read objects with little or no preprocessing or data quality assurance prior to storage.

The rampant distribution of data across on-premises and off-premises environments creates a demand for what I have started to refer to as “cross-platform integration” products. In the best scenario, these products will provide three main functions:

  • They will streamline data movement between off-premises hosted or cloud-based environments and the enterprise data environment.
  • They will enable native access to a wider variety of off-premises data sources, especially streaming sources.
  • They will allow for incorporation of data standardization and validation rules to ensure proper alignment at the integration point.

That third item is critical. It implies that developers can have the integration system provide semantic alignment of concepts that may have different representations or reliance on different reference data sets. This will help ensure some degree of data usage quality for downstream users. That's especially true when combining data from internal systems, SaaS providers, cloud-based big data applications and numerous data streams.

What puzzles me is the absence of awareness about this challenge. At the same time, I'm surprised that vendors seem to struggle to communicate what their products do, how they do it and why anyone would care. I don't anticipate that this messaging vacuum will last long. I predict that in the next three to six months, more vendors will actively promote products addressing the need for high-quality, cross-platform integration.

Big data quality footer image


About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at

Related Posts


  1. Pingback: Big data, data standards and cross-platform integration | 7wData

  2. Neat read, David. What I think is, some SaaS users mostly look at the benefits first before the actual service provided. They want to know what is in it for them in a price they can afford and then negotiate with the vendor they chose if they could align their resources to their needs.

Leave A Reply

Back to Top