In my last post, I detailed some of the pros and cons of data on demand. Inherent in that post, however, was the assumption that employees know which data types and sources are available.
Ask anyone with a skosh of experience working at mature organizations if that's always the case and you're likely to hear a few chuckles. (Ditto for many startups as well.) In both cases, many times people either buy or generate data sets that already existed in some other part of the organization. To be fair, I'm not immune here. A few times over my career, I spent
wasted a great deal of time on data cleanup or generation because of organizational communication issues.
If it was a tad tricky to understand an enterprise's different data sources then, it's often downright difficult today to know what's out there – for several reasons. First, there's more data than ever – a trend that shows no signs of abating. That data doesn't necessarily "live" on-premise (re: in a relational database that the organization owns). Perhaps it lives in a single data warehouse or data lake, but odds are that the data is strewn all over the place. What's more, that data often lies outside of the enterprise, making it more difficult to govern.
Against this dizzying backdrop, allow me to posit a few questions that employees ought be asking before they undertake massive data and analytics projects:
- What "internal" data – if any – has the firm used in the past to answer [insert name of question]? What was the quality of that data? Has it improved or decremented over time?
- What types and sources of data exist "under the radar?" (Think here of the data equivalent of Shadow IT.) I've seen throughout my career no shortage of independent Microsoft Access databases, Excel spreadsheets and even paper files that contained key information not housed in systems of records.
- What "external" data – if any – has the firm used in the past to answer [insert name of question]? Here I'm talking about things such as open data, linked data, data generated via an application programming interface (API). (Note here that the answer may in fact, be none. Many companies are just getting their arms around external data.)
- Has the enterprise developed a clear and comprehensive inventory of its data? (Think of the data equivalent of a data dictionary.) I would bet a great deal of money that the answer to that question is almost always either "no" or "I have no idea."
Simon Says: Start with communication.
As much as technology and data matter, I'd argue that communication is paramount here. Unless employees know what's out there, it's unlikely that their analytics efforts will be optimal. Part and parcel to obtaining this knowledge is the simple process of asking.
Let's return to the question that I asked in this post: How do I know what data is available?
You don't – unless you ask.
What say you?Download a free paper – Data Integration Deja Vu