In Part 1, I introduced this series with a moving metaphor. I compared the importance of having a data retention strategy to the opportunity my recent move provided me to sell, donate or throw away stuff I no longer use.
Without question, organizations now collect, store, process, manage, analyze and govern more data than ever before. Yet while enterprises have numerous processes for maintaining data, few have processes for removing unused or unnecessary data. An additional complication is dealing with copies (i.e., data silos). That's especially an issue when copies of enterprise data are customized for a specific application or analysis, or to satisfy the needs of a tactical project or strategic initiative. Even though this customized data often becomes obsolesced after the conclusion (or abandonment) of its project or initiative, it continues to be retained and maintained.
A data retention strategy has to strike a delicate balance between the desire to archive or delete unused data and the need to keep certain data for legal discovery or risk mitigation. Data privacy concerns about data retention are a key driver behind new regulatory compliance legislation, such as GDPR. I have recently received email notifications from my search engine, social media, website hosting, blogging and other service providers regarding policy changes being implemented for GDPR. These include transparency about what data is collected, restrictions on how data is used, and user-customizable time limits on how long data is retained, after which it’s automatically deleted.
Lack of a data retention strategy promotes data hoarding. For example, in the US we recently completed our income tax returns. Some people hoard boxes of printed receipts for everything in hopes of using them to itemize tax deductions. Most of those receipts are discarded as irrelevant during tax preparation, but they're sometimes retained due to fears of a future tax audit. Unfortunately, defaulting to data hoarding can make it nearly impossible to find the data you need.
Take an organized approach to data retention
A more selective and organized approach would be to use predictive analytics on incoming data – and metadata management to tag relevant data with high-level categories and applicable date ranges. To extend my tax preparation analogy, I know that only expenses and earnings from January 1, 2017 through December 31, 2017 are applicable for my 2017 tax return. I also know that some categories of expenses are not tax-deductible, and usable tax deductions only fall into a few categories. Due to its sensitive and personally identifying nature, I also want to mask or encrypt most of this data. This approach would both minimize and secure the data I retain and keep it organized enough throughout the year so that I can easily find what I need when preparing my taxes. And after I submit my tax return, I could archive the detailed data and just retain the summaries used to complete it, masking any sensitive data such as my tax identification number.
As a general guideline, an effective data retention strategy should detail what data should be retained and who can access it. It should also determine whether the data is sensitive enough to require masking or encryption, how long the data should be retained, and the archival or removal procedures to use when the data retention period has expired.Download a white paper about using SAS to protect personal data