The perils of self-service data preparation tools


“Temporary solutions often become permanent problems.”

—Craig Bruce

I’m all for self-service data-preparation tools. As I mentioned in my previous post, they collectively offer a number of advantages for those who are willing to get their hands dirty.

preschool girl gets her hands dirty with fingerpainting, similar to adults doing self-service data preparationLast spring, a group of students in my enterprise analytics class worked on an interesting project. To make a long story short, the students worked with a small manufacturing company (call it ABC here) in an attempt to understand the latter’s data and make recommendations.

Lamentably, ABC ran a very old manufacturing system that had seen much better days. Extracting data from this application to a CSV was neither easy nor reliable. Even worse, its data required significant cleanup work for meaningful analytics.

A happy outcome (sort of)

Fortunately, the students developed Band-Aids in the form of macros in Microsoft Excel. That is, after pulling the data, they could hit a button that would largely purify it and let them discover ways for ABC to save money, reduce inventory and the like.

These tools can enable temporary, “workable solutions” without addressing the core issue.

This is just one of myriad uses for Microsoft Excel, the Swiss Army knife of applications. On the one hand, my students made me proud with their stubbornness and automated solution. Prior groups of students may have resorted to manual solutions. On the other hand, though, I wasn’t happy that the students needed to create macros to clean ABC’s data – because it fails to address the root of the issue.

In a nutshell, this is my biggest concern with the mindset among many folks who rely too much upon self-service data preparation tools: They allow for temporary, “workable solutions” without addressing the core issue. In other words, the data now exists in two forms: the errant data in the core system and the ostensibly clean CSV file.

If the cleanup is part of a one-time project, that’s bad enough. But what if this Band-Aid (macro, ETL routine, etc.) becomes permanent? We have now added more friction, more steps into the organization. Finally, let’s not forget that macros and ETL routines can break.

Simon Says: Fix data quality issues at the source.

Self-service data preparation tools can certainly get a firm out of a jam. Still, I worry that some employees, departments and large firms use them inappropriately. Rather than take data governance seriously and/or implement a proper MDM application, they deal with the problem in a superficial way.

Permit me to draw an analogy here: Many golfers over-correct their slices by taking a stronger grip. They are solving one problem by introducing another. It’s best to learn how to swing the club properly – especially if you want to hone your game over the longer term.


What say you?

Download a TDWI paper about data preparation for analytics

About Author

Phil Simon

Author, Speaker, and Professor

Phil Simon is a keynote speaker and recognized technology expert. He is the award-winning author of eight management books, most recently Analytics: The Agile Way. His ninth will be Slack For Dummies (April, 2020, Wiley) He consults organizations on matters related to strategy, data, analytics, and technology. His contributions have appeared in The Harvard Business Review, CNN, Wired, The New York Times, and many other sites. He teaches information systems and analytics at Arizona State University's W. P. Carey School of Business.

Related Posts

Leave A Reply

Back to Top