It is no secret at all that there is a world of difference between theoretical and applied econometrics. Every analyst, as a practising econometrician, experiences this moment in their processional career – usually at the beginning of it – when the theory acquired during their academic time clashes with the practice. They realise that real life data are not as orderly arranged or ready for analysis as those demonstrated during the learning process, their quality is far from good, and volumes of data go in terabytes instead of megabytes. Furthermore, data are not analysed just for the sake of analysis; what really matters are results - their interpretability and business benefits, which come from them. How to move from theory to analysing real data and finding answers to real questions?
I was inspired to take on this topic by the article entitled Sinning in the basement: What are the rules? The Ten Commandments of Applied Econometrics by Peter E. Kennedy. The author discussed differences between theoretical and applied econometrics by indicating 10 rules called 10 Commandments of Applied Econometrics that are key for analysing the data correctly in practice.
In this post, I would like to bring the first commandment closer to you and share my experience and thoughts on this issue.
- Use econometric theory and common sense.
Do not rely on rules and models mindlessly. Stop for a while, consider a better approach before you start estimating regression parameters or train other predictive models. Data analysis requires the use of tools that are best suited to a given business problem and structure of data. According to the No free lunch theorem, there is no single method that works best for any dataset Thus, it is important to know the underlying theory on which the statistical tools are based. Armed with this knowledge, we can choose methods that are more powerful and may lead to better results. However, the temptation to use complicated and theoretically sophisticated methods should be avoided. The Pareto principle is valid also for the business analytics. According to this principle, the lion’s share of business benefits is attributable to simple methods that do not generate any significant cost or risk – such as linear or logistic regression. The added value provided by methods that are more advanced but also more time consuming and burdened with higher design risks – such as nonparametric methods or random forests – can be small as compared with classic methods, and the higher cost of implementation may not be justifiable.
The selection of an appropriate method and function form of the model is critical for obtaining reliable and interpretable results. For example, when you are building a model for forecasting the likelihood of customers leaving a company (aka “churn”) based on customer characteristics, you should select methods that give results within the range from 0% to 100%. If this rule is not observed the forecast likelihood of churn could be for example -13%. Such a value would be uninterpretable, and the result could not be operationalised directly without post-processing. If the result obtained is obviously erroneous the majority of analysts would probably try to figure out the reasons, find the mistake and apply a different method. But the situation is not always so obvious.
In summing up the first commandment for econometricians, I encourage the readers not to rely on magic formulas, automatic solutions or predictive models that can do the entire analytic job for us. This is because such universal formulas, automatic solutions or models simply do not exist. Our common sense and good understanding of the basics of statistical methods should underlie our intuition when assessing how useful a particular method can be. In the words of Ludwig Boltzmann, an Austrian physicist: “there is nothing more practical than a good theory”. The widespread use of the Mean Absolute Percentage Error (MAPE) is one of the examples of poor understanding of the statistical properties of forecast accuracy measures. In the context of inventory optimisation, the use of MAPE may lead to wrong recommendations of stock replenishment. Mateusz Zawisza explains the reasons for that in his post entitled: “Measures of forecast accuracy – what to choose”.
What is also important is to put the statistical tools in the business context of a problem. This topic will be discussed soon in the next post of the 10 Commandments of Applied Econometrics series. Those interested are encouraged to read the original article by Peter E. Kennedy.