If you regularly read this blog, then I’d bet that you’ve seen your fair share of data quality issues in your career. You’ve experienced first-hand the perils of messy, incomplete or duplicate data. Maybe you’ve even tried to make analytics happen inside in your organization, only to curse your computer or colleagues over the state of your firm's data.
But think back to when you were a 21-year-old. Odds are that you didn’t know what you didn’t know with respect to enterprise data and analytics. This is exactly the spot in which many of my students find themselves as they undertake their semester-long capstone projects. Their sponsors provide them with data that often needs considerable wrangling.
No, I simply won’t let the project of a 400-level course turn into pure data cleanup. At the same time, though, I certainly see the benefits of self-service data preparation for this project – and for others down the road.
Overcoming the IT-business divide
Historically, functional employees would turn to IT to extract, load and clean data. This is particularly acute in mature organizations. For this reason among others, the IT-business divide has hamstrung many a firm.
While playing around with proprietary and open-source data-cleanup and preparation tools won’t singlehandedly overcome the divide, it can certainly minimize it. For more on this, see one of my previous posts.
Building your data chops
If you’ve ever worked in a firm with pristine data, congratulations. You’re in rarefied air. Many if not most data management professionals deal with a variety of thorny data-related gaffes and issues on a regular basis. (This is especially true if you scrape data from websites, a lesson that many of my students have already learned by the time I’m done with them.)
If you’ve ever worked in a firm with pristine data, congratulations. Most of us have to do a fair amount of cleanup from time to time.
Sure, this can be frustrating, but you don’t build your data chops by working with the good stuff – at least not as much. (You don’t improve your skills at pool or tennis by beating terrible opponents; you have to “punch above your weight.”) It’s one thing to know that you can clean data; it’s another to actually do it as part of your job. Call me crazy, but in an era of big data that’s a great skill to have.
Making yourself more valuable to your employer – and others
Today’s increasing tech-driven disruption means that companies and even entire industries are going the way of the Dodo. If you like driving trucks and aren’t retiring soon, be wary.
No, I can’t predict the future – but there’s plenty of evidence to suggest that those with strong analytics skills will prosper.
Feedback
What say you?
Download a TDWI paper about data preparation