Edward Snowden and the NSA's PRISM program did many things far beyond the scope of this blog. For the purposes of data management professionals, however, there was one undeniable benefit: the scandal and subsequent fallout brought the importance of metadata to the forefront. (See Obama's 2014 Metadata Proposal.) Many organizations that had previously ignored it or doubted its importance doubtless reconsidered their position. Hmm...maybe this stuff actually matters after all?
Has there been progress on the metadata front in the last several years? Sure, but there's still plenty of room to go in the business world. Technology is helping us make (better) sense of this sea of data. But, as is almost always the case, the issue starts with us, not computers. In particular, many people continue to make the mistake of thinking that metadata only applies to structured (read: table-friendly) data.
To quote John McLaughlin, "Wrong!!!"
On the contrary, let me make this argument. Compared to its structured counterpart, it's even more important for us to use metadata when trying to understand and interpret unstructured data. Put differently, metadata is arguably more valuable and necessary when dealing with unstructured data because simple counts, mins, maxes, and other SQL stalwarts aren't nearly as meaningful.
Think about it. Without sufficient metadata, garnering a true understanding of the following data sources is often difficult if not impossible—at least now:
- YouTube videos
- Blog posts and comments
- Instagram photos
- Phone calls
- Other forms of unstructured data
Let's say that you possessed a great deal of metadata on these data sources. You can access accurate dates, tags, times, and categories. Will you able to accurately describe and predict what will happen next? I don't see how. Even with a good chuck of metadata, many data sources are not (completely) usable, much less understandable. As I wrote in The Visual Organization, "Voice, image, and facial recognition continue to improve, but few would characterize these fields as perfect at present."
Think back to the 2013 Boston Marathon bombings for a moment. The authorities were able to catch the culprits even though they lacked complete data. Sometimes, brute force is necessary to accomplish your goals. There's little doubt, though, that metadata can be extremely useful not only with respect to structured data, but the unstructured stuff as well.
What say you?