Agility in external data ingestion


In two previous posts (Part 1 and Part 2), I explored some of the challenges of managing data beyond enterprise boundaries. These posts focused on issues around managing and governing extra-enterprise data. Let’s focus a bit on one specific challenge now – satisfying the need for business users to rapidly ingest new data sources.

100932797Sophisticated business users recognize the potential for extracting value from different types of externally sourced data. But those data sets are configured in many different ways. In some cases, the data appears in a traditional structured format, such as collections of records with comma-separated values. Other times the data is free-form, like text in emails or other documents. Sometimes there are structural hints embedded in unstructured data – for example, when character strings with hashtags are embedded in small messages  sent via social media. And sometimes data sources combine text, images, video, etc. These are even more complex from a structural perspective.

Whenever a data source is identified as having some value to the business, you need to work quickly. Because you have to figure out what structure it has (if any), what valuable information it carries, the best way to capture the information, and how fast the data can be absorbed.

In some cases, the data source can be subjected to what might be deemed a modified data profile. While profiling has been used to scan data sets and assess suitability for use, adaptations to that approach take a different route. Adaptations may blend traditional statistical analysis with analytical utilities such as text analytics, data value imputation (that is, replacing missing data with statistically valid values) and other data manipulation techniques. Blended techniques provide a broader approach to data preparation than traditional methods.


Best practices paper

But what really distinguishes these data preparation tools is that they are meant for business users, not IT staff. Data preparation tools give end users access to raw data. They give business users more say in interpreting semantic structure and meaning based on their expectations. At the same time, this direct exposure to raw data demonstrates the “glitches” in the data that would have traditionally hampered the IT team’s ability to ingest and integrate the data sets. The result of using such data preparation tools is that business users can communicate more effectively to data practitioners when they describe which standardizations and transformations need to be performed.

In other words, user-oriented data preparation tools engage business users, encourage conversations between business experts and IT teams, and speed the process of developing applications for data ingestion and integration. This reflects a core tenet of the agile development approach: increased collaboration between IT and business experts. An implication is that sophisticated data management technologies are necessary for speeding data ingestion. Ultimately, this approach reduces the it takes to make external data available for use in analytics.


About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at

Leave A Reply

Back to Top