Data stewardship for artificial intelligence

Data steward assures data quality to help AI efforts succeed — Check out these resources on artificial intelligence.

Artificial intelligence (AI) is playing an ever-increasing role in enterprise solutions. Unlike robotics, which automates manual tasks, AI automates computing tasks. That's especially valuable given the large and diverse data sets organizations use today. While the human role in enterprise solutions will never disappear, it’s foolish to argue against the advantage of AI-augmented people. For example, think of the tremendous productivity benefits gained when you fully automate time-intensive tasks – like analyzing gigabytes of data.

AI was once mostly the domain of science fiction. But now, AI is data-driven science fact. I can't overstate the importance of the data-driven aspect. It's why data management for artificial intelligence is so crucial. AI algorithms adapt based on what they learn from data, so it’s vital for their education to start with the best possible data. While data scientists are responsible for developing AI’s algorithms, curating AI’s curriculum is the responsibility of data stewards.

Data stewards – correcting data quality issues, bridging communications gaps

In a previous post, I explained that while the activities associated with data stewardship vary depending on an organization's unique corporate culture, a data steward is often the go-to person for questions about data. One reason is that data stewards can bridge the communications gap between business and IT stakeholders about how data is used. But perhaps the most common and important aspect of data stewardship is assessing and correcting data quality issues. This includes:

Data profiling.
Removing invalid and redundant data.
Adding missing data.
Standardizing data formats.
Grouping data (when required) to create a smaller number of relevant data points. This is also known as binning.

AI applications require data quality capabilities to be built into the data integration flow. In today’s hybrid data ecosystems, data moves around a lot in multiplatform environments. It moves from source to staging and sandboxes, to data warehouses and data lakes, and then into analytics tools and reports that provide business intelligence. When possible, data quality processing should be performed at the source by its data stewards to improve performance and greatly reduce the learning curve for AI. The potential of AI can only be realized if the data feeding its algorithms and models is curated by effective data stewardship.

Learn about data management best practices for improving artificial intelligence initiatives