We talk quite a bit about data scientists, but in some cases we may have forgotten the person who made the data possible in the first place – the data engineer. In this post and the next one, I'll concentrate on the definitions and qualities of each role. Are they the same or different? Or is there just some overlap? For now, let's concentrate on the goodness of the data scientist.
Data scientists: The questions they ask
The data scientist is the person who needs to ask the right questions and tell the correct data stories. Sometimes the data scientist will need to lead a person down the path of statistics, financial analytics or customer analytics, and sometimes they just make the magic come alive. Let’s assume we have a project where we're analyzing specific retailer information. In this example, the data scientist would ask questions like:
- What are we analyzing about these retailers (what do we want to know)? Are we looking at point-of-sale information or how well they market our products?
- Are we analyzing one specific product or many different products? Take for example, our coffee shops that serve pumpkin spiced coffee and other seasonal products in the fall. Do we want to know how much is sold during this season, or do we want to see how these sales compare to other products?
- Do we want analysis that covers a set time period? For example, do we want to see the last few years or the last few months?
- Do we want to know what these retailers sold during a specific season – for example, fall versus winter, or spring versus summer?
- Do we want to drill into the information from a chart or graph, or do we want to see raw statistics?
- How should the data be presented? (The data scientist may want to suggest how the data should be visually presented, or may want to use examples.)
Accessing the data, and other skills the data scientist needs
The data scientist may assume that the data is available for use, but he or she may need to wrangle the data from multiple places into conformity for presentation. (Data wrangling is one of my favorite things!) Sometimes the required data is in Hadoop, which can require other skills; sometimes the data is in a structured database. Being able to take messy data and make it understandable is a real art.
The data scientist should know where to find the data for analysis, and should be able to interpret any business rules that may have been applied to this data. They rely on good business metadata to help with navigation and data usage. Data scientists also need the following skills to do their jobs well:
- Communication skills. I cannot emphasize this enough: A data scientist must feel very comfortable talking statistics, business data needs and presentation.
- Statistics and other data mining skills. These are required to have a good understanding of how to analyze the data.
- Data wrangling skills that require more than a lasso. For example, data scientists need programming skills for multiple platforms, like Hadoop and structured data, so they can conform the data for analysis.
- Business intelligence skills. These are required to present analytical findings in a format familiar to business users. These tools tell the data story in the form of graphs, charts, etc.
There's no doubt that the data scientist plays an important role in any organization. So, who gets the data in the first place and transforms it? We'll cover that in my next blog post.
Want to become a SAS certified data scientist? Learn how.