David Loshin explains how to set up a data catalog that will help you get more value from a data lake.
David Loshin says simple approaches to identity resolution may not scale on a big data platform as data volumes increase.
David Loshin explains 4 struggles of syndicating master data across the enterprise.
David Loshin explains why MDM is such a valuable tool in helping to detect fraud.
David Loshin extends his exploration of ethical issues surrounding automated systems and event stream processing to encompass data quality and risk considerations.
David Loshin describes three sets of policies required for ensuring compliance with data protection directives for health care.
Health care fraud prevention is a sticky topic. David Loshin discusses what's needed to balance prompt claims payments with fraud prevention efforts.
I've been working on a pilot project recently with a client to test out some new NoSQL database frameworks (graph databases in particular). Our goal is to see how a different storage model, representation and presentation can enhance the usability and ease of integration for master data indexes and entity
As the application stack supporting big data has matured, it has demonstrated the feasibility of ingesting, persisting and analyzing potentially massive data sets that originate both within and outside of conventional enterprise boundaries. But what does this mean from a data governance perspective?
In my last post we started looking at the issue of identifier proliferation, in which different business applications assigned their own unique identifiers to data representing the same entities. Even master data management (MDM) applications are not immune to this issue, particularly because of the inherent semantics associated with the
I was surprised to learn recently that despite the reams of laws and policies directing the protection of personally identifiable information (PII) across industries and government agencies, more than 50 million Medicare beneficiaries were issued cards with a Medicare Beneficiary Number that's based on their Social Security Number (SSN). That's
One aspect of high-quality information is consistency. We often think about consistency in terms of consistent values. A large portion of the effort expended on “data quality dimensions” essentially focuses on data value consistency. For example, when we describe accuracy, what we often mean is consistency with a defined source
We often talk about full customer data visibility and the need for a “golden record” that provides a 360-degree view of the customer to enhance our customer-facing processes. The rationale is that by accumulating all the data about a customer (or, for that matter, any entity of interest) from multiple sources, you
In my prior posts about operational data governance, I've suggested the need to embed data validation as an integral component of any data integration application. In my last post, we looked at an example of using a data quality audit report to ensure fidelity of the data integration processes for
Data integration teams often find themselves in the middle of discussions where the quality of their data outputs are called into question. Without proper governance procedures in place, though, it's hard to address these accusations in a reasonable way. Here's why.
In my last post, we explored the operational facet of data governance and data stewardship. We focused on the challenges of providing a scalable way to assess incoming data sources, identify data quality rules and define enforceable data quality policies. As the number of acquired data sources increases, it becomes
Data governance can encompass a wide spectrum of practices, many of which are focused on the development, documentation, approval and deployment of policies associated with data management and utilization. I distinguish the facet of “operational” data governance from the fully encompassed practice to specifically focus on the operational tasks for
What does it really mean when we talk about the concept of a data asset? For the purposes of this discussion, let's say that a data asset is a manifestation of information that can be monetized. In my last post we explored how bringing many data artifacts together in a
A long time ago, I worked for a company that had positioned itself as basically a third-party “data trust” to perform collaborative analytics. The business proposition was to engage different types of organizations whose customer bases overlapped, ingest their data sets, and perform a number of analyses using the accumulated
In my last post, I started to look at the use of Hadoop in general and the data lake concept in particular as part of a plan for modernizing the data environment. There are surely benefits to the data lake, especially when it's deployed using a low-cost, scalable hardware platform.
More and more organizations are considering the use of maturing scalable computing environments like Hadoop as part of their enterprise data management, processing and analytics infrastructure. But there's a significant difference between the evaluation phase of technology adoption and its subsequent production phase. This seems apparent in terms of how organizations are
In my last post, I discussed the issue of temporal inconsistency for master data, when the records in the master repository are inconsistent with the source systems as a result of a time-based absence of synchronization. Periodic master data updates that pull data from systems without considering alignment with in-process
Master data management (MDM) provides methods for unifying data about important entities (such as “customer” or “product”) that are managed within independent systems. In most cases, there is some kind of customer data integration requirement for downstream reporting, and analysis for some specific business objective – such as customer profiling for
In my last post we started to look at two different Internet of Things (IoT) paradigms. The first only involved streaming automatically generated data from machines (such as sensor data). The second combined human-generated and machine-generated data, such as social media updates that are automatically augmented with geo-tag data by