Machine learning changes the way we forecast in retail and CPG


Machine learning is taking a significant role in many big data initiatives today. Large retailers and consumer packaged goods (CPG) companies are using machine learning combined with predictive analytics to help them enhance consumer engagement and create more accurate demand forecasts as they expand into new sales channels like the omni-channel. With machine learning, supercomputers learn from mining masses of big data without human intervention to provide unprecedented consumer demand insights.

Predictive analytics and advanced algorithms, such as neural networks, have emerged as the hottest (and sometimes controversial) topic among senior management teams. Neural network algorithms are self-correcting and powerful, but are difficult to replicate and explain using traditional multiple regression models.

For years, neural network models have been discarded due to the lack of storage and processing capabilities required to implement them. Now with cloud computing using supercomputers' neural network algorithms, along with ARIMAX, dynamic regression and unobserved components, models are becoming the catalyst for "machine learning-based forecasting."

According to an article in Consumer Goods Technology magazinethrough pattern recognition there will be a shift from active engagement to automated engagement. As part of this shift, technology (machine learning) takes over tasks from information gathering to actual execution. Compared to traditional demand forecasting methods, machine learning-based forecasting helps companies understand and forecast consumer demand that, in many cases, would be otherwise impossible. Here are several reasons why:

Incomplete versus complete information and data. Traditional demand forecasts are based on time-series forecasting methods (Exponential Smoothing, ARIMA, and others) that can only use a handful of demand factors (e.g., trend, seasonality, and cycle). On the other hand, machine learning-based forecasting combines learning algorithms (ARIMAX, dynamic regression, neural networks and others) with big data and cloud computing to analyze thousands – even millions – of products using unlimited amounts of causal factors simultaneously up and down a company’s business hierarchy.

Traditional demand forecasting and planning systems are restricted to only the demand history, while machine learning-based forecasting can take advantage of limitless data, determining what’s significant, then prioritize available consumer insights (demand sensing) to influence future demand using “what if” analysis (demand shaping). Compared to traditional time-series forecasting systems, machine learning-based forecasting solutions identify the underlying demand drivers that influence demand, uncovering insights not possible with traditional time-series methods. Additionally, the self-learning algorithms get smarter as they consume new data and adapt the algorithms to consumer demand.

Holistic models using multiple dimensions versus single dimension algorithms. Traditional forecasting systems are characterized by a number of single-dimension algorithms, each designed to analyze demand based on certain data-limited constraints. As a result, much manual manipulation goes into cleansing data and separating it into baseline and promoted volumes. This limits which algorithms can be used across the product portfolio.

Machine learning-based forecasting takes a more sophisticated approach. It uses pattern recognition with a single, general-purpose array of algorithms that adapt to all the data. They fit many different types of demand patterns simultaneously across the product portfolio up/down the company’s business hierarchy without data cleansing handling multiple data streams (e.g., price, sales promotions, advertising, in-store merchandising and many others) in the same model — holistically -- without cleansing the data into baseline and promoted volumes.

For example, traditional forecasting systems have a specific purpose leading to multiple inconsistent forecasts across the product portfolio.  With machine learning-based forecasting, the same algorithm is useful for multiple processes including pricing, sales promotions, in-store merchandising, advertising, temperature, store inventory, and others creating one vision of a realistic integrated forecast.

Partial versus complete use of item history. When creating demand forecasts, traditional demand forecasting and planning systems analyze the demand history for a particular product/SKU, category, channel and demographic market area. Machine learning-based forecasts leverage history for all items, including sales promotions, to forecast demand for every item at every node in the business hierarchy simultaneously.

Many feel the next generation of machine learning will also include cognitive computing where the supply chain becomes self-healing. This would improve upon machine learning by going beyond predictions to making decisions to automatically correct for anomalies in the supply chain.

Do you see machine learning-based forecasting supporting the next-generation demand management?  Will it eventually lead to cognitive learning creating an autonomic self-healing supply chain; or are you still relying on cognitive dissonance to justify and maintain judgmental harmony within your current demand forecasting and planning process?

You can follow Charlie on:


About Author

Charlie Chase

Executive Industry Consultant/Trusted Advisor, SAS Retail/CPG Global Practice

Charles Chase is the executive industry consultant and trusted advisor for the SAS Retail/CPG global practice. He is the author of Next Generation Demand Management: People, Process, Analytics and Technology, author of Demand-Driven Forecasting: A Structured Approach to Forecasting, and co-author of Bricks Matter: The Role of Supply Chains in Building Market-Driven Differentiation, as well as over 50 articles in several business journals on demand forecasting and planning, supply chain management, and market response modeling. His latest book is Consumption-Based Forecasting and Planning: Predicting Changing Demand Patterns in the New Digital Economy. To learn more, please see his Author page.


  1. This is an interesting article which does much to clarify what modern machine learning algorithms aim to achieve in relation to demand forecasting. There are two things which were running through my mind while reading it:

    One is that looking for patterns in a large amount of data, particularly if several types of data are involved, can, in some cases, result in patterns being found which are purely the result of coincidence. These can be weeded out with appropriate statistical testing. However, if a lot of patterns are found then a very high level of statistical significance is required in order to determine which of the patterns are unlikely to be the result of coincidence. Also, the appropriateness of the assumed statistical distributions can be very important.

    The other point is that machine learning of demand patterns in relation to slow moving items tends to be highly ineffective unless the demand pattern for the item concerned is looked at in relation to similar items or other relevant data. Even retailers tend to stock a lot of slow moving items. For instance, if they stock clothing, they might sell large quantities of, say, padded jackets. However, they probably sell very few of size XXXXL of a particular style and colour. Consequently, when looking for demand patterns, it is likely to be useful to group all styles, colours and sizes together and look for demand patterns for the whole group as well as for individual SKUs. The data in relation to the group can be much more useful that the data in relation to any particular slow moving item. Human involvement in such grouping is likely to be necessary. I chose to talk about clothing because I think it illustrates my point well. I have no experience in relation to inventory management of clothing. The point which I am making in relation to grouping items appears to be alluded to in parts of the article.

    I would be interested in any comments in relation to what I have said.

    • Charlie Chase

      Hi Don,

      Thank you for the comment.

      I don’t have a lot of experience using machine learning [ML] in the apparel industry. All my experience is in the consumer packaged goods (CPG) industry. Also, I’ve mainly used ML for forecasting demand. I do know in the retail apparel industry the data is much larger than data for say soft drinks and food products. That said, we found some coincident patterns using neural networks when I worked in the soft drinks industry that didn’t make sense based on our domain knowledge and experience. Occasionally, using traditional multiple linear regression models we found coincident patterns as well, but most were related to poor data, or errors in the data. For example, we found a correlation with 2 litter bottles of a soft drinks and end cap displays. We knew this was either a coincidence, or an error in data because we never promoted 2-liter bottles on an end cap display. We investigated the situation and found that someone made an error when auditing the store. Apparently, a consumer decided not to buy the 2-liter bottle and placed it on a sold-out end cap display. When the auditor saw it, they thought we ran a 2-liter bottle end cap display. What we found is when applying any statistical models, it not only requires a strong background in statistics, but also having domain knowledge and experience, which is just as important.

      Regarding your second question, we found ML models do not work very well for products that have sparse data (intermittency), slow moving products, or with non-seasonal products with short history. As you know, ML models are data hogs. They require a lot of data. So, we segment product demand history by long seasonal history, short history non-seasonal, and intermittent data. Then, apply ML to the products with longer seasonal history, traditional time series models for short history non-seasonal data, and intermittent demand models for products with sparse history. In other words, we segment product data and then, target those segments with the appropriate model set. ML models are not the end all for forecasting. They are just another category of models in our took kit that we can use. You may want to read the M4 model competition run by Spyros Makridakis, 2017 and 2019.

  2. Charlie Chase

    Hi Don,

    Thank you for your input and comments. We value your insights.

    Your first comment is a valid one if you were to boil the ocean with data that may not necessarily be related to the market, channel, brands, product groups, products, SKU’s, demand points, and customers. For example, during my graduate work we were asked to forecast the production of pig iron in the United States, and for fun, we found the history of pigs slaughtered. We added the history of pigs slaughtered into the model and found for a six month period pigs slaughtered was significantly correlated to the production of pig iron. So, if you knew the number of pigs slaughter you could predict the product of pig iron for a six month period. However, why would anyone add pig’s slaughtered data to predict pig iron production?

    On the other hand, there have been anomalies with data in some cases using these advanced predictive algorithms, but this is why we need data analysts (data scientists) to monitor machine learning-based forecasting. Let’s expand on this a little further with a recent real life example. About 6 months ago a plane with 250 passengers was flying from Hawaii to San Francisco. Prior to taking off all the weather conditions including head winds were fed into the auto pilot computer. Based on the result, the appropriate amount of fuel was loaded onto the plane. The plane took off and the pilot put the plane on auto pilot. Half way to San Francisco the pilot and co-pilot realized that the head winds were stronger than anticipated. They reran the analysis and realized that they would run out of fuel and crash into the ocean before they arrived in San Francisco. So, they called back to the control tower in Hawaii and turned the plane around, refueled based on the new information, and flew safely to San Francisco saving 250 lives. If this were machine learning the algorithm would have picked up the higher than normal head winds, recalculated and would have recommended to return to Hawaii to refuel. Even with that, I would still “not” fly on that plane without an experience pilot and co-pilot. Human oversight will always be needed on an “exception basis”. That doesn’t mean that machine learning can’t be used to automate the majority of work (heavy lifting) allowing demand analysts to focus on the exceptions versus touching every data series every forecasting cycle, which by the way is pretty difficult if a company has 18,000 SKUs, that sell in 100 markets, across multiple channels with multiple demand points, and over 1,000 customers. Not to mention thousands of stores.

  3. Charlie Chase

    Don, in response to your second point, we can certainly use machine learning to predict new products and short lifecycle products like fashion clothing. Based on product profiles we can find surrogate (as like) products, as well as include size/color optimization to predict the demand for those fashion clothing products. We have done it very successfully with companies like Hanes Brands, and others. We also use machine learning to optimize those patterns up/down a product hierarchy for hundreds of thousands of products utilizing patterns at all levels of the hierarchy including groups, sizes, colors and more. We also found that by segmenting a company’s products into slow moving, fast moving, and steady state we can further improve the accuracy by applying algorithms that are more specific to those demand patterns, like intermittency (sparse demand).

    The goal of machine learning-based forecasting is to eliminate the unnecessary work of managing information/data preparation and adding more robust analytics to do the majority of the work thus improving forecast accuracy for the majority of the product portfolio. Today, 80% of a demand planner’s job is managing information and data, not to mention judgmentally manipulating that information/data virtually adding “no” value to the forecast.

    I hope this answers your two points.

    • Thank you for your comments. You have done a thorough job of answering my two points.
      I like your comment about judgmental manipulation of information/data virtually adding "no" value to the forecast. Unfortunately, human ability to interpret statistical data is seriously limited. One of my interests has been to make good use of human judgment when and only when that judgment has a sound basis (e.g. knowledge of the market).

  4. Hi Charlie,

    Very insightful piece, as always from you.

    I do see machine learning is coming and coming fast to solve time series problems. It's going to be particular useful when it comes to short time series, new product, short life cycle series... this is typically challenging as the pattern may be deeply buried so that the normal demand drivers cannot easily explain the results.

    For other longer time series, I think the machine learning approach needs to address the 'black box' challenge. In the traditional time series approach, the methodology is transparent and results are also transparent. Users can extract a lot of insights which can be used for many other analytical processes such as market mix, optimisation and etc.

    With machine learning at its current form, users may loose such insights. Hopefully the future machine learning can not only generate better end results and also provide the decomposed insights that can be reused for other business problems.

    Also there is a challenge to get business buy-in with machine learning, due to again 'black box' approach. So an 'open box' machine learning could address these challenges.
    A different topic for another day...

    What's your thoughts on this?

    Davis Wu

    • Charlie Chase

      Hi Davis,

      You bring up some great questions regarding machine learning.

      First and foremost, machine learning is technically another name for artificial intelligence (AI—first generation).

      Machine learning is actually a branch of computer science where advanced algorithms learn from data through pattern recognition. Traditionally those algorithms only included neural networks, but today we can also include ARIMAX, dynamic regression, Unobserved Component Models (UCM) and other algorithms. The variety of different algorithms provides a range of options for solving problems, and each algorithm will have different requirements and tradeoffs in terms of data input requirements, speed of performance, and accuracy of results. These tradeoffs along with the accuracy of the final predictions are weighed to decide which algorithm will work best for that particular situation.

      While many machine learning algorithms have been around for a long time, the ability to automatically apply complex statistical models to big data conducting multiple iterations at faster speeds is fairly recent. As a result of grid processing, parallel processing, in memory processing, and cloud computing we can now run multiple iterations using all the advanced algorithms simultaneously against big data.

      Given all these advancements machine learning can now solve the traditional time series challenges. It's not only going to be useful for short time series like new products, and short life cycle products, but also for mid- and long-range series uncovering demand drivers that cannot easily be detected using traditional time series methods--that can only uncover trend and seasonality. In fact, these machine learning capabilities are available today. We are now entering into the “Second Machine Age”.

      Traditional “black box” forecasting systems are based on those traditional time series methods. Now we have the capabilities of transparency with the second generation machine learning as it is not completely relying on neural networks alone. You can extract lots of insights for the market mix, optimize them, and used them for demand sensing and shaping. For example, we can use those insights to decompose the data into the market mix elements, as well as use them to shape future demand for hundreds of thousands products up/down the business hierarchy automatically.

      The main reason we have this “black box” mentality is a result of years of not investing in statistical skills. Demand planners who are essentially managers of information and data will soon become obsolete as machine learning takes over those mundane work activities. The future will require highly skilled “data scientists” (demand analysts) who will become the overseeing of machine learning-based forecasting to monitor and adjust on an exception basis letting machine learning-based forecasting do the heavy lifting.

      Today, due to traditional time series based systems demand planners must touch over 80% of the forecasts using judgement. In the future, as machine learning-based forecasting continues to evolve into the 2nd and 3rd generation data scientists will only need to touch 10% or less of the data series tweaking the models generated by machine learning-based forecasting solutions This will require only a handful of data scientists versus many demand planners.

  5. Thanks Charlie, we can't wait to embrace the second machine age.

    A side question, what's your thoughts on the roles and calibre for future demand planners (to be separate from demand analysts)?

    • Charlie Chase

      Hi Davis,

      The second machine age is here now. Machine Learning-Based Forecasting is available today.

      Demand planners, as mentioned early, will become obsolete as machine learning takes over the mundane data and information management aspects of their roles, as well as the bulk of statistical modeling. Today, 80% of a demand planner's role is managing information and data. Machine learning-based forecasting can correct for outliers automatically, use more advanced algorithms to model the effects of sales promotions, and other related causal factors, as well as integrate POS/syndicated scanner data (true demand) with sales orders/shipments aromatically up/down business hierarchies for hundreds of thousands data series learning as new data and information are uncovered. As such, the demand planner role will need to transition into a demand analyst (data scientist) role monitoring and tweaking those advanced algorithms on an exception basis where needed. They will also need to be embedded in the commercial side of the business closer to the consumer/customer. They will provide needed analytics knowledge to the commercial business to better "anticipate", rather than just forecast when consumers will buy, what they will buy, and how they will buy (IoT-devices/brick & mortar). Then, determine how they can influence those consumers to purchase their products, rather than their competitors products.

      Four key trends are changing this landscape in the CPG industry--1) the automated consumer engagement, 2) IoT-connected devices, 3) predictive analytics, and 4) faster more powerful software driven by supper computers and cloud computing using in memory processing. We can no longer manipulate information and data to meet our objectives hoping that somehow we will make and/or exceed those projected plans. Demand planners will need to transition to demand analysts with stronger statistical skills, gain more domain knowledge regarding the consumer, and be embedded in the commercial side of the business to help generate demand, rather than reacting to demand. Machine learning-based forecasting will take over the bulk of the information and data management activities, as well as the bulk of the statistical forecasting. As demand analysts they will be only touching forecasts on an exception basis, and in most cases tweaking the models on an exception basis generated by machine learning as new information is uncovered that wasn't originally entered into their Demand Signal Repository (DSA). This is a radical change for demand forecasting and planning being driven by the four trends outlined above. It will take a strong "change management leader" to transition to this new demand analyst role within companies.

      Those companies who complete the transition will see not only reductions in inventory costs and working capital, but increases in market share, revenue and profit.

  6. Hi Charlie, Thanks for sharing your thoughts on using Machine learning in forecasting. After going through your opinions and SAS documentation of products, I have a question for you.

    What do you meant by machine learning? Are you referring automation as machine learning? Because in your words "Today, due to traditional time series based systems demand planners must touch over 80% of the forecasts using judgement. In the future, as machine learning-based forecasting continues to evolve into the 2nd and 3rd generation data scientists will only need to touch 10% or less of the data series tweaking the models generated by machine learning-based forecasting solutions This will require only a handful of data scientists versus many demand planners", it seems you are mixing automation and machine learning. What I think is "Reducing touch over 80% of forecasts to 10% of forecasts or going through many algorithms is automation not machine learning ". Machine learning is to improve the accuracy of each series by finding non-linear/non-parametric relationship using machine learning algorithms like neural networks. Automatically going through many series or many algorithms is automation. Each series with machine learning algorithm involved is using machine learning.

    My opinion might be inaccurate, but please provide clarification on automation vs machine learning in your words above. Thank you in advance.

    • Charlie Chase

      Hi Narendra,

      Thank you for responding to the Blog article.

      As I mentioned we are in the second generation Artificial Intelligence/Machine Learning, approaching the third generation. Your definition of machine learning is much narrower, and true to its strictest definition. Today, we can blend non-linear predictive algorithms (e.g., neural networks) with traditional non-linear predictive algorithms (e.g., ARIMAX, dynamic regression, UCM, and others which can also be non-linear) for machine learning. All those classes of predictive analytics can be used (as you mentioned) to improve the accuracy of each series by finding non-linear/non-parametric and linear relationships automatically up/down a business hierarchy for hundreds of thousands data series. So, my definition of machine learning is more of a hybrid including all those predictive (linear/non-linear) algorithms.

      Today, due to traditional time series based systems (restricted to moving average, Exponential Smoothing, and seasonal/non-seasonal ARIMA models) demand planners must touch over 80% of the forecasts using judgement. As those algorithms are limited to detecting only linear patterns associated with trend, seasonality, and cycle, for the most part. Advanced predictive models like ARIMAX, dynamic regression, and UCM models use pattern recognition to uncover other patterns associated with price, random shifts, sales promotions, and many more internal/external factors. Also, as you add more causal data and history don’t they self-adjust finding additional patterns? They do.

      Not in the future, but now (today), as machine learning-based forecasting continues to evolve into the 2nd and 3rd generation, combining all advanced predicative algorithms including neural networks, data scientists will only need to touch 10% or less of the data series tweaking the models generated by machine learning-based forecasting solutions. This will require only a handful of data scientists versus many demand planners, this reducing touch points (manual manipulation, and judgment) by over 80% to 10% of forecasts using many advance predictive algorithms including neural networks, as well as other advanced predictive algorithms allowing automation to take over 80% of the forecasting.

      Isn’t machine learning automation? Is it artificial intelligence? Or, both. Doesn’t neural networks, linear/non-linear ARIMAX, dynamic regression, and UCM algorithms fall in the same classification of “advanced predictive” modeling, as does neural networks?

      Let me know your thoughts.

Back to Top