Data science and machine learning are riding the popularity wave. There is plenty of buzz in social media, crowds at meetups and conferences, and rising interest in postgraduate studies in this area. There is clearly a growing awareness of the power of advanced data analysis methods and the benefits that can result from their application. Organizations have started to treat analytics as a priority, and a way to gain market advantage (see How companies are using big data and analytics). Tools and methods that have been around for years are finally reaching a wider market.
This is good news, and we will all benefit from it.
Well, as long as we take care to use data science and analytics in the right way and at the right time…
Unfortunately, the terms ‘data science’ and ‘machine learning’ are linked to myths that can limit their effective implementation in organizations. I have spent some time gathering some of these horror stories, designed to scare off would-be users, and can confirm that these myths are alive and well. I hope this blog will help to dispel the myths, and cast some light on the subject.
Myth 1: Technology rules
“Machine learning is a programming problem. Want to be a real analyst? You need to understand how to manage the code and version it, and you must have a GitHub account.”
Machine learning is NOT a specific technology or a specific mode of working with it. It is the ability to formulate an analytical problem in line with the business reality, then carry out data collection, analysis and implementation. It does not matter whether you use code or ready-made tools, commercial or open source technology, or indeed anything else. It's your choice - any path is good if it leads to specific business benefits.
Building fake tautologies identifying analytics with specific technology, and the role of an analyst with a specific IT profile (such as an advanced programmer), is one of the cardinal errors. An analytical organization should be focused around achieving business objectives rather than using specific technologies. If you are interested in this subject, I recommend a report from EY on Becoming an analytics-driven organization to create value.
Myth 2: Analysts are only interested in algorithms
There is a view out there that analysts are only interested in algorithms and programming, and do not want to understand the business or its problems.
And yes, it is true that rapid changes in technology mean that analysts do have to spend time immersed in technological or algorithmic issues, and have less time to spend on understanding the business. They do, after all, need to fully understand their tools and their trade.
But they also need to understand the business problem they are trying to solve, how the analysis will be used, and any constraints. Without this, they will be unable to construct a valid target variable, determine the appropriate set of predictors, or find the right success criterion for the model. Most importantly, they will not create a product that will bring measurable benefits to the business.
Lack of close cooperation between analysts and the business side leads to the emergence of “corporate laboratories”, focused on research and science rather than support of decision-making processes. There is an interesting article about this and other organizational challenges in implementing data science from KD Nuggets.
Myth 3: Intuition trumps analytical results, so why bother?
We have all, from time to time, heard someone say “We tested method A, but the result did not match our intuition, so we tested methods B, C, D ... K until finally we succeeded.” It is true that analytics has to feel ‘right’, but organizations also need to be open to their results, and accept that sometimes intuition is wrong.
There is no sense in setting up an analytical project to provide answers and then ignoring its results. Asking for the analysis to be repeated over and over again with slightly different methods is also costly. If the results do not match ‘gut feeling’, perhaps the best option would be to discuss the difference with staff and others to see if there is any reason for this mismatch. ‘Collective intelligence’ can work wonders in interpretation.
Machine learning is not a universal panacea for every possible pain. This may be heretical, but sometimes there just is not enough data, time, or resources, or it is simply faster to ask an expert. But the possible use of machine learning should not be discarded because of unsubstantiated myths. What do you think? Please share your experiences and let’s discuss!
Accept that sometimes intuition is wrong... #MachineLearning Click To Tweet