Recently I was at a big box home improvement store buying supplies for my new residence. I was pressed for time and stuck standing in a long checkout line. Then, like a shining beacon on the dark horizon, a self-service checkout signaled an opportunity to get the things I needed and get on with my day. This reminded me of the need for self-service data preparation, especially in the era of big data. There's so much data available from numerous sources and formats; it’s like shopping in one of those big box stores that has an overwhelming selection of items to choose from. And when we do find the data we need, we often have to wait in line for IT to create customized processes to prepare the data based on our requirements.
Stop waiting in line
Data preparation is a formal aspect of many systems and applications – like data warehousing and business intelligence – where multiple data sources are integrated into a single data model optimized for querying and standard reporting. But the transformations and data quality rules applied to the source data are often not designed to be reusable components. So data preparation often becomes an informal practice conducted by business analysts and data scientists performing ad hoc reporting and analytics.
It's a frustrating exercise for most of these users. Many resort to manual data preparation, a.k.a. spreadsheet wrangling. Not only is this time-consuming; it's often redundant, with different users (or even the same user) performing the same work. And they don't necessarily generate the same results each time.
This is why there's a surge in demand for self-service tools that not only allow data preparation to be completed faster, but also allow it to be consistently repeated. Many data quality issues are a result of the distance (both physical and temporal) separating data from the real-world object or entity it attempts to describe. Self-service data preparation improves data quality by reducing those gaps in space and time. It can even enable self-validation data governance. What’s more, self-service data preparation can service the entire enterprise when practitioners pay it forward by sharing metadata and making data quality rules reusable. That enables business users and data scientists alike to work with data on their own, with less reliance on IT.
Just like self-service checkout lets you get in, get out, and get on with your day faster, self-service data preparation lets you get fast access to the data you need. So you can get on with getting value out of your enterprise data assets.
Download a TDWI paper about data preparation