Data has an expiration date

5

Have you ever wondered why bottled water has an expiration date?

Well, in the United States at least, it’s mostly New Jersey’s fault. A 1987 state law required all food products sold in New Jersey to display an expiration date of two years or less from their manufacturing date. So, in order to standardize interstate distribution, most bottled water manufacturers gave every bottle a two-year expiration date.

Even after New Jersey amended the law a few years ago, a bottled water expiration date had become somewhat of an industry standard, so many manufacturers still use one today.

Unlike bottled water, data has an expiration date - but hardly ever uses one.

The era of big data seems to be fostering the false notion that we have an obligation to retain any data that we come across because of its potential usefulness. Instead of a “use it or lose it” attitude toward data, we have a “retain it and maintain it” attitude, which is making data hoarders of us all.

Some data is retained to support historical analysis, so we can learn from the past in order to predict probable futures - especially to try to predict the near future with real-time analytics. But there are limitations to historical analysis. Even though velocity is one of big data’s 3Vs, nowadays the world is changing just as fast as the data is moving, so the future is resembling the past less and less.

Instead of mountains of data that are managed just because they're there, we need to acknowledge that all data has an expiration date, after which the data should at least be archived, or possibly even deleted.

Share

About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

5 Comments

  1. Hi Jim

    Very good points regarding data usefulness, and I also wanted to point out the other side of the "retain and maintain it" debate as you called it is the very real risk associated with data retention.
    Data Retention policies balance the desire to archive and purge unused or "expired" data to reduce storage costs and business risk with business and legal discovery retention management requirements. These policies must clarify what data needs to be stored for how long, in what format, applying what rules, with what level of masking or encryption, and with what access guidelines.
    So Big Data evangelists must contend with this balance. The most likely outcome in my opinion will be a "Keep the summary, kill the details" approach which will aggregate the key insights while eliminated the transaction or interaction details which could introduce enterprise risk.
    Thanks
    Rob

  2. Dave chamberlain on

    "Compliance" in its broadest sense also leads to "over hoarding" - keep it just in case someone, somewhere for some currently unknown reason decides they need it to either find the culprit or exonerate the organization.

  3. I couldn't agree more Jim. Acually, I usually agree with everything you say :).
    One of the arguments I keep raising for this is that unless business users of the data agree to maintain the data to ensure it's quality and accuracy, the odds are they won't be able to find what they are looking for due to the rate of data degradation. So tons of data stored in systems becomes almost useless if you can't find it.
    Funny thing is though, most people can't get their head around this.
    Would love to hear your thoughts on this.
    Thanks very much!

  4. load options: The increase time spent searching is not necessarily accurate. The UCM ‘search’ is really a match where you will only get back results if you have a very very close match to something that already exists. In order to reduce the time spent searching, the information would have to be removed from the end points (ci & c3) where the search occurs.

    Second bullet: the Business user view into UCM is the match results, which agreed if there are duplicates that match at over 90% they will need to be resolved at that point in time.

    I also thought that option 1 took care of some low lying fruit…the elimination of canex…CI records not in C3…

    What the business needs to understand is that to get rid of the Company records and get a real impact, we need to get rid of them from the end points and all associated transactions, opportunities…need to be archived as well or we will have data integrity issues.

  5. Jim Harris

    Thanks for your comments, Rob, Dave, and Jill.

    @Rob — Excellent point about data retention policies. I included your remarks in the opening of my follow-up post: Can we Measure the Half-Life of Data?

    @Dave — Great point about the hoarding-enablement caused by concerns about future compliance. Since this is the time of year in the US when we are preparing our tax returns, your point made me envision the boxes and boxes of receipts for everything that some people hoard in hopes of using them to itemize tax deductions, most of which is discarded as irrelevant during tax preparation, but is sometimes stored in the attic just in case of a future tax audit.

    @Jill — Awesome, as usual, point about “keep it just in case we need it” often being followed by “we can’t find it amongst all the stuff we kept.” Perhaps if we were more selective and organized hoarders, using predictive analytics to pre-sort incoming data and metadata management to tag data with high-level categories and applicable date ranges. To extend my tax preparation analogy, I know only expenses and earnings from January 1, 2012 through December 31, 2012 are applicable for my 2012 tax return, and I know that some categories of expenses are not tax-deductible, so I could delete that incoming data, and usable tax deductions only fall into a few categories, which helps me keep things at least organized enough throughout the year so that I can find what I need when preparing my taxes.

    Best Regards,

    Jim

Leave A Reply

Back to Top