Data steward: The concierge of analytics

Data steward is similar to a concierge (of analytics) I just returned from an impromptu vacation in Kansas City where I stayed at a quirky little boutique hotel. The hotel itself was forgettable, but the concierge, a delightful man named Raymond, was a reminder of the key distinction between information and insight. A concierge’s job includes recommending restaurants, securing tickets to special events, and arranging tours of local attractions. These days you would think a Google search and a smartphone would be all you'd need to find that information. But a good concierge – and Raymond was an excellent one – understands the human perspective and provides insight that can, at the very least, augment what can be learned from data. As I explained in my previous post, data is not the only decision maker. Raymond’s insight saw beyond online restaurant reviews, recommended ticket prices and tourist traps. With his help, my friends and I ate well, enjoyed cool shows and took in the sights – all at a reasonable price.

The value (and activities) of the data steward

A recent TDWI report (summarized in an article: 5 ways to become data-driven) revealed that a key difference between using data to glean insights and analyzing data to drive decisions and actions hinges on collaboration among a group of key individuals throughout the organization who have well-defined roles. With no disrespect intended to the much-lauded data scientist, it's important to recognize that the data steward is one of the most important of those roles. While the activities associated with the role vary greatly depending on the unique corporate culture of an organization, a data steward is often the go-to person for questions about data. Data stewards bridge the communications gap between business and IT stakeholders about how data is used, and they care for data assets on behalf of the enterprise through governance and by assessing and correcting data quality issues.

In recent years, another activity associated with data stewardship has involved managing the hierarchy of data storage correlated with frequency of access. Data that's used most often can be routed to very fast storage – like solid state drives, or even CPU memory cache. Less frequently used data can be routed to older, and cheaper, spinning hard disk drives. But analytics involves working with everything, and all types of data. That could range from experimental data sets – which often sit best in a data lake and are relevant to particular teams or business units – to highly structured, vetted and consensus-driven data that's useful to the entire enterprise and is most logically kept in a data warehouse. In the middle are structured data sets which, possibly due to size or level of cleanliness, are seen as somewhat less than production-level. These data sets most likely live in Hadoop or cloud storage, but they're often queried from relational databases using SQL-on-Hadoop bridges like IBM Big SQL, Microsoft PolyBase, and Oracle Big Data SQL. One reason data hierarchies are important is because there’s also a hierarchy of tools, technologies and platforms, and well-defined data hierarchies dictate best practices in tool-chain deployment. Data stewards provide invaluable human intelligence to make sure data is properly placed within the hierarchy.

Data stewards: Essential for analytics

Over the last decade, concierge services have become a staple of luxury credit cards, offering services like setting dinner reservations, researching travel arrangements, getting access to exclusive events, and more. This service offering is also known as lifestyle management. A data steward is the concierge of analytics, providing life cycle management for the data that data scientists, analysts, business users and the entire enterprise depend on.

Download – The SAS Data Governance Framework: A Blueprint for Success