In February of this year, the Washington Business Journal reported the US Government appointed its first Chief Data Scientist, DJ Patil. With this, I think it’s now safe to say that Data Science is officially sanctioned as new mode in organizations. Those that can apply the necessary finesse along with business acumen to make sense of big data has formalized into a ‘new’ profession.
I talked to one of our own, to find out his thoughts in what it takes to be a data scientist. And true to his ilk, SAS’s Adam Pilz applied text analytics to figure out what skills were being sought to fill this coveted role.
Crawling just over 7,000 public postings from a job website, Adam investigated the key elements companies were looking for in a data scientist. They must be highly educated to attain a job.” Masters degrees or greater was seen as a requirement for 81% of the advertised jobs – comparing that to the 12% of the American population. Indeed, there is a clear distinction between the level of scholarship obtained by the general public and that required of a data scientist.
In terms of the prowess of data scientists? He saw the top 10 most desirable analytical skills mentioned by a prospective employer were:
Adam suspects that the first two categories (machine learning and optimization) may simply be popular buzzwords added to job postings, and perhaps optimization may be the Human Resources department’s way of describing how to make things better - versus the mathematical method. If that holds true than it’s possible that text analytics is the most sought after skill in the data scientist market. At a minimum it’s in the top three.
He saw that text analytics and forecasting were the fastest growing desirable skills. And of course, as with all text analysis, various synonyms were captured for each of the terms seen above. For example, content analysis, NLP, sentiment analysis, text classification, topic extraction, etc. are all included in the term ‘text analysis’.
‘Data wrangling’ is a fun term. It conjures up romantic notions of the wild west, within (no doubt) the Text Frontier - wrestling big data beasties, captured by causally (and similarly) dressed cowboys who are methodical in their approach (big buckle bragging rights will be seen at this year’s SAS Global Forum in Dallas, as a matter of fact).
Breaking this out further, Adam compared:
- “lower level skills” = those that are lower in importance as education attainment increases, relative to
- “higher level skills” = those that become more important to have as education increases.
In the two charts below, the skills are in ranked order of importance.
As education level increases (from left to right), skills like data wrangling, data visualization and basic statistics are not prominently featured as required skills for data scientists, as Masters and then again, PhDs are expected to focus their time on more sophisticated types of analysis.
Text analytics, on the other hand, jumps significantly in importance ranking as skill level rises, possibly because the outputs of such an analysis are highly sensitive to the methods used and thus impacted by subject matter expertise. Linear regression and design of experiments both become more important with increasing education, and generalized linear models show up as a required skill for PhDs.
I also asked Adam if he has seen any trends in the usage of the term ‘data scientist’. He said that “the level of education required to be a data scientist has remained the same for the last year, but there are important geographical differences”. Backing this up, he pointed to differences seen in the highest level of education mention in the job postings, segmenting based on the highest level of education that was required. When looking at the entire US, he found that Bachelor’s was the least sought after degree for data science positions, only seen in 19% of the job postings, while Masters were the most cited educational requirement, garnering 54% of the advertised positions. PhDs claimed the remaining 27%.
Geographical differences in required skill level were found in Silicon Valley relative to the rest of the country. He saw that inside Silicon Valley, PhDs were required for 50% of the jobs listed (and Masters were required for 36%). This was in contrast to jobs outside of Silicon Valley, where PhDs and Masters were identified for 33% and 55% of job postings, respectively.
It’s been said that SAS has more PhDs on staff than does any single university in the United States. And if you’re using open source code for example, perhaps you do need more PhDs on staff to make sure that algorithms are behaving correctly. I know here at SAS we build that expertise right into the software.
I asked Adam what software he used to do this analysis. He initially used Base SAS® and found that once he’d written the code to tag terms he was able to find them in the postings. However, he soon moved to SAS® Contextual Analysis. The difference? SAS Contextual Analysis highlighted the word tagged by the category and so he was able to search for the specific terms and see what else people talking about. He found that the text analytics software gave him the insight into how different postings were saying similar things, in addition to informing what other phrases he might want to investigate, concluding that the text analytics approach was ”..More enlightenment than discovery”.
Adam did this research before coming to SAS – in his search for a new career. We are thrilled to have him and his data scientist prowess as part of the SAS family.
Regardless of title (Adam is described as a Solutions Architect), the skills attributed to data science have been held by those in the analytic field for some time.
Do you see yourself as a data scientist?