In my two previous posts, I pondered whether unlimited data could limit data silos (i.e., whether offering users the enterprise data management equivalent of unlimited data streaming could curb their enthusiasm for creating data silos) or if streaming past the limits of unlimited data could create more data silos if users became frustrated with the practical limits that would need to be enforced on the amount and/or speed of data they could stream.
In this post, to extend my original analogy, I want to compare and contrast streaming with syncing, which was far more prevalent before data streaming speeds sped up enough to offer a viable alternative.
For example, I perform syncing to keep my iTunes music library updated across multiple devices (smartphone, tablet, laptop) because I have created a personal data silo of music. Having these redundant copies (silos) becomes problematic when updates (syncs) are not applied to all copies. The more copies I make, and the more music I buy, the more time and effort I have to spend trying to keep all the silos synced.
Returning to enterprise data management, consider not only internal data silos but also the use of external reference data, a growing category of big data that Henrik Liliendahl Sørensen recently blogged about.
Streaming external reference data as a service is a better and more sustainable approach than copying external reference data into an internal data silo, which almost immediately starts to become out of alignment with the real world precisely because it has been siloed. And then we waste a lot of subsequent time, money and effort trying to keep the internally siloed data updated as the real world changes.
Back to my iTunes music library. I could alternatively use iCloud, or another cloud service, to create a single master music data source to maintain, from which I could stream without syncing.
This is why I think the cloud combined with data streaming is the future of enterprise data management, especially master data management in the cloud, which Prashanta Chandramohan recently blogged about. However, one unavoidable challenge is this would mean streaming a single version of the truth – which, because its concept is often resisted, also drives the proliferation of data silos.
To sync or to stream: That is the data question
Syncing allows for multiple versions of the truth by enabling you to choose not to sync your data silo with the single version of the truth. Streaming eliminates the data silos, but also eliminates the possibility of viewing data in a different way or using data for a different purpose.
What other issues do you think impact whether enterprise data management should sync or stream?