In my previous post, I discussed how data has an expiration date, after which it should at least be archived, or possibly even deleted.
Rob Karel added an excellent comment about the very real risk associated with data retention, and why data retention policies are necessary to “balance the desire to archive and purge unused or expired data to reduce storage costs and business risk, with business and legal discovery retention management requirements. These policies must clarify what data needs to be stored for how long, in what format, applying what rules, with what level of masking or encryption, and with what access guidelines. Big data evangelists must contend with this balance.”
My thoughts about this challenge, especially within the context of big data, are being influenced by the book I’m reading, The Half-life of Facts: Why Everything We Know Has an Expiration Date by Samuel Arbesman, which explains that facts change all the time. For example, smoking has gone from doctor recommended to deadly, we used to think the Earth was the center of the universe, and that Pluto was a planet. In short, what we know about the world is constantly changing. (Although we don’t always get the memo — for lots of examples, check out the Wikipedia list of common misconceptions).
“Knowledge is like radioactivity,” Arbesman explained. “Just as we know that a chunk of uranium can break down in a measurable amount of time — a radioactive half-life — facts, in aggregate, also have half-lives: We can measure the amount of time for half of a subject’s knowledge to be overturned.”
This made me wonder if it would be possible to establish an expiration date for data by measuring its half-life. Such as measurement would have to be more sophisticated than simply using when the data was created or how long it’s been since we used the data. After all, old data can remain useful, just like old facts can remain true (e.g., water molecules are made of two hydrogen atoms and one oxygen atom), and, as Rob pointed out, unused data might need to be retained for valid business reasons.
What do you think?
Can we measure the half-life of data? Please share your thoughts by posting a comment below.