One of the hottest job descriptions in the data management field in the last couple of years is the position of “data scientist.” The rise of this job description matches the rise of the concept of big data – data sources that are very large in volume, change frequently, and have indeterminate or variable structure.
Why is this new position required, and can an existing member of your data management staff – the data steward – fulfill this role? Let's take a look at both and see how they compare.
The Data Scientist
The data scientist role is defined as an evolution of the data analyst/business analyst role that is central to the acquisition and interpretation of data for business intelligence functions within the organization. The rise of big data requires a “superset” of analysis capabilities to properly interpret and match incoming loosely-structured data to existing data stores for further analysis by end users.
In addition, an effective data scientist is a persuasive communicator to both upper levels of management and to both business and IT resources on the usefulness of these rapidly changing data sources. In everyday terms, the ideal data scientist is a “blue sky” thinker who comes up with innovative approaches to use new sources of data.
The Data Steward
The data steward role in the data management organization is defined as a person who has a deep understanding of the business meaning of data within the scope of his responsibilities. A data steward also has an operating understanding of how these elements are represented in database systems, and how they are related to each other.
In many ways, the data steward acts as the primary negotiator of conflicts in definition and implementation between data systems. With that view, the steward can then provide documentation of the data contained in constituent systems to other groups of users, such as report developers and systems analysts. (Sidenote: if you know of a good data steward, nominate them for a Stewie as part of Data Stewards Day).
The Steward as Scientist
In the overall analysis, the responsibilities of a data steward are not a good fit for the data scientist role because:
- The two roles operate at different levels of detail. The data steward is a very detail-oriented position, requiring specialized knowledge of his data subject area from both the business and technical perspective. The data scientist, on the other hand, looks at data sources from a higher level, determining the best fit for new sources of data in the existing infrastructure.
- The data steward primarily works with existing structures and ensures their efficient operation for downstream constituents, with a limited role in determining new policies and procedures within his scope of operations. On the other hand, the data scientist primarily analyzes new data structures, relating them to existing structures.
- The two roles perform fundamentally different functions. Once the data management infrastructure is constructed, the data steward performs an operational and administrative function. The data scientist is more of an explorer and unstructured thinker, creating new ways to utilize data in the organization.
This is not to say that a data steward could not become an effective data scientist, but to do so, the data steward would need to shift his focus from operational matters to a largely analytical focus. This would require more advanced training in analysis techniques than is required for the data steward role. If you are considering promoting from within, it is more likely that you will find potential data scientist candidates in your business and data analyst communities than in the data management staff.
The roles of data scientist and data steward are both valuable to an efficient and viable data management organization, and should be recognized as such, even though they require different skill sets. Both roles should be staffed with competent, effective people to ensure that your organization manages its data in the most effective way possible.
3 Comments
Very clear explanation on data steward. What about the Business Analyst. with whom will the system professional most likely consult when mapping its current business processes?: Data Steward or Business Analyst?
Thanks for sharing.
I would argue that data stewardship and data scientist roles are more of an internal naming convention and implicit responsibilities of data analysts, data engineers, and data scientists than explicit roles.
Usually, I see data analysts do data identification, cleaning, storage, and data glossary, as well as data analysis itself. These people, arguably, cover greater scope and deliver better insights than data scientists who do not or are not willing to get their hands dirty with data stewardship even if their analytic project is entirely dependent on it. For companies that are just starting out with data-related functions, I'd suggest they market their positions using a generic, data analyst title. This will cover the full lifecycle of analytics and attract serious potential candidates who strive to become analytics experts.
Thanks for sharing your thoughts on this; there are different ways of looking at it, for sure. If you're interested, take a look at what we offer on our data science training portal (for data scientists and analysts): https://www.sas.com/en_us/training/academy-data-science/data-science-resources.html