Without question, organizations now collect, store, process, manage, analyze and govern more data than ever before. In this era of big data we seem obliged to retain all data due to its potential usefulness. An additional complication is dealing with all the copies (i.e., data silos) of enterprise data customized for a specific application or analysis, or to satisfy the needs of a tactical project or strategic initiative. Even though this customized data often becomes obsolesced after its project or initiative concludes or is abandoned, it continues to be retained and maintained. Enterprises have numerous processes for maintaining data – but few have processes to minimize data usage by removing unused or obsolesced data as part of a well-defined data retention strategy.
How to minimize data usage: Count the ways
Even among the data you and your organization actively use on a regular basis there are countless opportunities to minimize your data usage. To identify them you need to formally track your data usage. In recent years, there has been an explosion of data trackers for gathering information about our health, fitness and finances. We use them to count the calories we consume while eating, the calories we burn while exercising, and the money we spend while shopping. Without this information we couldn't easily minimize our unhealthy and extravagant habits. Even our mobile devices have trackers that gather information about which apps we use, when, and how often. This helps us avoid overages on mobile data plans and can even identify what apps tend to make us less productive and attentive (for me, it’s video streaming services, games and email).
Track your data usage – keep it simple
You don’t need to wait for a snazzy mobile app to track your habits so you'll know how to minimize data usage across the enterprise. I recommend using your favorite spreadsheet program to record your data usage every day for at least one month. Keep it simple, but be sure to at least capture the following columns of information:
- Name of the data.
- Brief description of the data, including its source and lineage.
- Business purpose/use for the data.
- When you accessed the data.
- How much time you spent using the data.
- Any data quality issues encountered while using the data (make brief notes).
- Amount of data used – if it's less than full volume, briefly describe any filtering or aggregation applied.
- Whether the data was necessary or helpful in completing tasks associated with its use.
You'll need to break down each individual use of the same data on its own row in your tracking spreadsheet. That's because the same source will most likely be used for multiple business purposes and/or will be accessed multiple times over the course of a business day. After you've tracked it for a month, you may be surprised by the results. A lot of data usage is habitual, redundant or superfluous – and therefore easily eliminated. You may find that you are:
- Using multiple sources for the same purpose.
- Wasting time cleaning up data when a better source is available.
- Using data just because it has always been part of your process (even though it does little or nothing to help you do your job).
The bottom line: You and your organization are almost certainly using too much data. Tracking your enterprise habits will show you how to minimize data usage – and optimize it – by eliminating redundant, low-quality and unnecessary data consumption. In turn, you can work more efficiently by using the least amount of data necessary to accomplish your business goals.
Learn how SAS can help you take charge of your data