We all find change easier when it starts with something we’re familiar with. That’s why I think sports analytics examples are popular – most of us are sports fans, so we get it more easily. It’s also why automotive examples that illustrate the potential reach of the Internet of Things (IoT) attract an enduring audience. Most of us are drivers and can relate to the benefits illustrated.
At Strata Hadoop in London recently, I had the pleasure of presenting SAS’ perspectives on intelligence for the connected vehicle. It was a lively session with the audience asking questions on a wide range of potential opportunities – from increasing safety, reducing risk and predictive maintenance to achieving loyalty and retention and real-time value adds like parking availability, charging stations and connected retail options.
While there were the usual questions about data sources, integration and algorithm design, there was also significant interest in data operations (data ops) – that is, the maintenance of data sources, preparation, quality and governance.
Effective data prep means better analytics and data science
Our experience with customers suggests that data scientists spend anywhere from 50 to 80 percent of their time preparing data. By anyone’s standards, this is not a good use of expensive, skilled, data scientists’ time. Data scientists are in short supply. If you’ve managed to recruit one or more good ones, you don’t want them performing data operations. A good data ops team could maximize your investment in data science.
We see data ops as addressing six key challenges: data engineering, data quality, updates, data integration or interoperability, data privacy and security. Your data ops team also oversees compliance with any regulatory requirements.
Data has multiple uses, both now and later – this is one of the real values of central data ops. Make sure the data is clean and accurate so it can be used repeatedly, for many different purposes. If you leave data prep for the users to do, they’ll clean just what they need and not share it any further. Others will have to reinvent the wheel later, at ongoing cost to the organization.
IoT requires cohesive accountability and accessibility
Data ops has to become everyone’s business for any organization that wants to become data-driven. The importance of data quality needs to be integral to the culture. Having a team responsible for cleaning data won’t be enough if those creating the data aren’t bothered about whether their entries are correct, or whether sensors are consistently reliable.
Data ops is not all about technical skills – it’s also about relationships and politics. Knowledge is powerful, and so is the underlying data that facilitates this knowledge. This means that the people who have the data hold onto it. Good data ops requires the ability to negotiate organizational politics and forge relationships to ensure that data is shared across functions and silos. In the IoT era, this also means working across organizational boundaries to extract value across the ecosystem.
IoT will create more citizen data scientists
With good data operations behind the data, users can run their own analytics without central IT support. In a world where most central IT support is struggling to keep up with demand, this could be a big win for many companies. Instead of trying to recruit just data scientists, it may be more valuable to focus on data ops. If central IT can provide clean, good quality raw data, users can then adopt self-service analytics solutions to get insights. We expect IoT to drive the emergence of more citizen data scientists – that is, business people who are curious, adventurous and determined to research, prototype and experiment with analytics.
Read the results of a survey based on interviews with 75 executives across Europe:
Internet of Things: Visualise the Impact
2 Comments
Chris, great article on IoT, data quality importance, and citizen data scientists. I still feel that central data scientists might be essential to provide "one version of truth" to the organization especially if several "citizen data scientists" have their own "versions of truth". Do you know if there is some movement on mistake proofing methodology for robust data architecture to prevent subsequent data quality issues?
Hi Murali, thanks you for the beedback. Appreciate it :-)
My take on your question in rgds to data quality methodology is that from my experience it is crucial to have teams that consist of data scientist and business experts. Unless you have someone who understands the data from a business perspective when integrating it, it is almost impossible to do the job right. Furthermore, if the issue comes down to master data management, you can take a look at our SAS solution for that ;-)
Have a great day,
Chris