I recently taught a graduate-level class on information environments. It focused on the intersection of data management technologies, metadata management, databases and analytics with the intellectual processes of information analysis and, ultimately, information presentation. A key theme of the class was to bridge technical skills with interpretation and synthesis as a way to convey what can be derived through data management and analytics. I invited a number of guest speakers to visit the class and discuss the kind of work they do and how it relates to the class topic. One of the first guests had spent the bulk of his career developing and refining entity resolution algorithms. This speaker described the challenges associated with identifying entity data, transforming the records into a standardized form, and applying entity resolution algorithms to match and link sets of records that could be determined to represent the same unique entities.
For most of the students, this was the first exposure they had to entity resolution. Yet by the end of the semester, it was clear to me that all of them had immediately grasped the concepts and understood the inherent value of the entity resolution process in relation to the other ideas we had discussed over the 15-week term. For example, one of the other guest speakers – a former government chief data officer – talked about the need for high-quality data. Another guest shared his experiences in developing customer reporting and analytics for a telecommunications company. A third talked about graph analytics. After each of these speakers, the students noted that all the applications were ultimately reliant on entity resolution concepts:
- The CIO’s talk was permeated with comments about the need to integrate records from multiple sources and use entity resolution to find matches for purposes of data cleansing.
- The customer reporting and analysis application pulled data from a number of client sites to produce the rolled-up statistics.
- The graph analysis application displayed relationships among records based on resolving their identities, and effectively merged disparate representations of the same entities into a virtual equivalence class.
As we reviewed specific use cases for analytics (such as customer profiling, classification and segmentation, fraud detection and product recommendation), it became apparent to the students that most of the business applications for analytics hinged upon already having accurately identified the subjects of the analysis.
Entity resolution – prerequisite for meaningful analytics
OK, you say – so what? Most technical people are already clued in to the value proposition for entity resolution. However, as technologists, we often overlook some of the subtleties inherent in understanding information interpretation and use. The challenge is that we often compartmentalize aspects of data management to enable their implementation. But as a byproduct of this compartmentalization, we evaporate semantic meaning out of the data. For example, the master data management process is transformed into a means for creating a master customer index. The resulting master data set is an achievement (technically) even if it is of limited business use – because the idiosyncrasies that differentiated original records were removed as part of the data cleansing and linkage activities.
In other words, since entity resolution is a prerequisite to meaningful analytics, think about how the matching and linkage algorithms and tools should be layered within your organization’s overall data strategy. Instead of using entity resolution as a post-processing bandage to fix perceived “errors” that cause records to be inconsistent, integrate it directly into the information flows:
- Identify the key entities that exist within the environment.
- Assess the information flows to determine the different business process touch points that either produce or consume entity data.
- Engage the consumers to understand how the entity information is used.
- Harmonize semantics around entity data attribute.
- Document the expectations for unifying the visibility into entity data.
- Finally, institute the right types of resolution, matching and linkage algorithms within the application supporting the business processes to ensure that downstream analytics will provide exploitable results.