Integration and publication: Data management for analytics

0

Once you have assessed the types of reporting and analytics projects and activities are to be done by the community of data analysts and consumers and have assessed their business needs and requirements for performance, you can then evaluate – with confidence – how different platforms and tools can be combined to satisfy the end-to-end data management demands. This is particularly useful for ingestion and provision.

Ingestion is composed of the tools and processes for data acquisition and persistence for at least three types of data sets: bulk data integration, interactive integration, and streaming data. For each of these data set categories, that means one or more of the following processes have to be supported:

  • Loading data onto the platform.
  • Profiling and validating the data to assess compliance with data usability expectations.
  • Applying transformations needed for data organization and alignment.
  • Storing the data.

Event stream processing, data integration, and data quality tools are the prime candidates to support the needs for data ingestion. At the same time, each type of data set poses its own challenges. For example, it may be straightforward to capture data streams and even store them, but without having immediate access to the entire data set at one time, one would need to defer the profiling and validation, which should be done before committed to persistent storage. This introduces a new dependency on the other capabilities as well as introduce additional (although not insurmountable) complexity into the data management environment. That, in turn, may suggest a hybrid infrastructure for data integration (instead of the presumptive Hadoop data lake).

On the utility side, you will probably want to enable transparent accessibility to data that can be used for analysis. That means these types of

Learn how data management can take your analytics from good to great.
Learn how data management can take your analytics from good to great.

processes need to be supported:

  • Data access for simple filtering queries (SELECT statements).
  • Querying and reporting using more complex query formats (JOINs).
  • Bulk extracts, which may then be targeted to other platforms.
  • End-user analytics (e.g. OLAP cubes) and discovery tools (such as visualization engines).

If we have opted for different persistence platforms for the data, enabling transparent accessibility actually means providing an opaque layer that shields the platform federation from the users. So aside from the typical business intelligence and visualization discovery data management tools, this also suggests the need for data virtualization and data federation, as well as metadata management to ensure consistency in presentation semantics for the users.

The determination of the right set of data integration and publication tools becomes a matter of convergence: what are the key performance variables for the users, what platforms can foundationally support that level of performance, and which tools have been professionally deployed on those platforms. This sounds simple, and while it still masks some of the challenges in really understanding how the usage scenarios may interplay, it does provide a starting point for devising the data management architecture for analytics.

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Related Posts

Leave A Reply

Back to Top