You’ve finally done it. You managed to stay awake through the endless series of MOOC videos, and you’ve mastered the IRIS data set. You've learned that lm() will build you a pretty nifty model in R, and you can fit a Classifier with SciKit Learn.
You know your Neural Net from your Nearest Neighbour, and whilst you might not (yet) be winning Kaggle contests, you can Clone a Git repo from past competitions and pretty much know what’s going on. Congratulations, you are now a data scientist! But are you ready for data science in the wild?
Unfortunately, you are unlikely to encounter real-life problems presented in rows and columns. If you’re lucky enough to get some clean data, it is unlikely to be in a single CSV file. And if it is, it probably won’t all fit into your laptop’s memory. You might even have to source your own data.
The harsh reality is that data science is a problem-solving game, not just a coding exercise. But that’s also what makes it so exciting. Data science is the effervescent intersection of statistics, mathematics, computer science and pretty much anything analytical that helps drive actionable insight from data. It is the wobbly bridge between the business and IT world and brings together a wide array of skills in order to understand underlying patterns in data and tackle real-world problems.
Whilst this may seem like a very long checklist, we can break these skills down into fundamentally the creative and analytical. The left and right side of your brain, if you like.
Clearly, there is a balance to be found between the soft and hard skills, and everyone will have their own natural inclination. Some data scientists will lean more towards being business-facing, whilst others may prefer to programme quietly with their headphones on.
Often many of these desired skills can, or should, be worked on by specialists in data engineering, DevOps or business analysis. It may help if you understand what HDFS is, but the role of the data scientist should lean more towards solving the business problem rather than platform management.
Likewise, within data science, it is important to have a foundational understanding in most models. But no one data scientist can know everything. For example, the skills needed for building a good time series forecast are vastly different from those needed for an exercise in computer vision.
You may be wondering how businesses can manage to do all the data science they want with everyone specialising. The answer is by building the right data science team with complementary skills and expertise.
Data science teams
Different industries and organisations practice data science differently. The goals of every data science project may be broadly comparable, but the structure and profile of teams are often very different.
Whilst some organisations can achieve phenomenal success with PhD data scientists, I’ve also seen real frustration in industry where small teams may not get the data access they need or visibility with the business stakeholders.
The best success I’ve seen, especially in the public sector, has been where organisations don’t pursue the unicorn approach to fielding data science positions and instead initially focus on building a data lab: a team of mixed discipline individuals collectively forming a data science capability. A business problem gets clearly defined, and a team structure is organised around the problem.
Because the data lab is built around a business problem, it has clear visibility in the organisation. It gets clear buy-in from the business, as well as from IT in terms of data access. This also makes it simpler to measure success and thus easier to establish ROI and TCO.
An approach to success
In my experience, this approach of building a capability and then drawing on teams to solve particular business problems gives the best chance of success with data science. It gives data scientists a structured environment to work in, and access to the tools and data that they need. It also gives the business access to relevant skills and experience. The whole may not necessarily be greater than the sum of its parts—but it certainly works better as a whole than individual elements.
To find out more, explore these resources: