The ingredients of a Data Scientist


It was just a couple of years ago that folks were skeptical about the term "data scientist". It seemed like a simple re-branding of an established job role that carried titles such as "business analyst", "data manager", or "reporting specialist".

But today, it seems that the definition of the "Data Scientist" job role has gelled into something new. At SAS Global Forum 2014, I heard multiple experts describe data science qualifications in a similar way, including these main skills:

  • Ability to manage data. Know how to access it, whether it's in Excel, relational databases, or Hadoop -- or on the Web. Data acquisition and preparation still form the critical foundation for any data analysis.
  • Knowledge of applied statistics. Perhaps not PhD-level stuff, but more than the basics of counts, sums, and averages. You need to know something about predictive analytics, forecasting, and the process of building and maintaining analytical models.
  • Computer science, or at least some programming skills. Point-and-click tools can help keep you productive, but it's often necessary to drop into code to achieve the flexibility you need to acquire some data or apply an analysis that's not provided "out of the box".
  • And finally -- and this makes a Data Scientist the most relevant -- the ability to understand and communicate the needs of the business. You might be a data wiz and have metrics out the wazoo, but an effective data scientist must know which fields and metrics matter most to the organization he or she serves. And you must be able to ask the right questions of the stakeholders, and then communicate results that will lead to informed action.

I don't claim to be a data scientist -- I'm not strong enough in the statistical pillar -- but I do have my moments. For example, I consider my recent analysis of blog spam to be data-science-like. Even so, I'm not brave enough to change my business cards just yet.

At SAS Global Forum I talked to Wayne Thompson, Chief Data Scientist at SAS. (Yes, even SAS is capitalizing on the buzz by having a data science technologies team.) Here he is introducing SAS In-Memory Statistics for Hadoop, a programming interface that's meant to empower data scientists:

Wayne and I also talked a couple of other times: once about SAS Visual Statistics ("it's the shizzle", says the bald white guy -- not me), and once about data science in general.

Data science isn't all just "Wayne's world" -- there were plenty of other data science practitioners at the conference. For example, check out Lisa Arney's interview with Chuck Kincaid of Experis, talking about how to be a data scientist using SAS. (See his full paper here.) And SAS' Mary Osborne, who presented on Star Wars and the Art of Data Science. (Her paper reveals the unspoken fifth pillar of a data scientist: it's good to be part nerd.)

What do you think about the "new" field of data science? Have you changed your business card to include the "data scientist" title?


About Author

Chris Hemedinger

Director, SAS User Engagement

+Chris Hemedinger is the Director of SAS User Engagement, which includes our SAS Communities and SAS User Groups. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies

Leave A Reply

Back to Top