Data preparation for streaming data

I recently returned from an extended vacation where I traveled to three different parts of the country to visit family for the holidays. During the trip I spent a lot of time sitting in airport boarding areas and hotel rooms, consuming gigabytes of streaming content from applications like Netflix, Hulu, Amazon and Pandora.

This got me thinking about some of the challenges associated with data preparation for streaming data. One of its critical aspects is technical infrastructure, as streaming data requires data virtualization and data federation.

Data virtualization provides a single interface for accessing distributed data with an abstraction layer hiding its technical details – such as how it’s formatted, or where it’s physically located. To collect data without copying or transferring the original data itself, data federation uses a virtual database that provides a common data model for heterogeneous data. The apps I mentioned earlier (on my smartphone and tablet) use data virtualization to provide single interfaces to the movies, television shows, videos and songs I streamed while on vacation. That's the case even though I have no idea:

Where my content providers store the source data.
What storage is required to have multiple copies of the same content for different users to access simultaneously – along with progress trackers that pick up where you left off if streaming is interrupted (like when you're using spotty airport or hotel WiFi).

These apps also use data federation so that I can stream their content without creating local copies of all those movies, television shows, videos and songs on my smartphone or tablet.

Essentials: Data preparation for streaming data plus virtualization, federation

The value of streaming data is obviously not limited to binge watching Marvel movies and Doctor Who episodes, and listening to what’s now apparently categorized as classic rock. Companies in every industry are pursuing the business opportunities generated by streaming data, including machine learning and streaming analytics. Along with the various aspects of data preparation, data virtualization and data federation are essential elements for getting the most out of streaming data.

Want to learn more? Read the 5 D's of data preparation