Local Search Optimization for HyperParameter Tuning

hdtvsWhen shopping for a new TV, with many sets next to each other across a store wall, it is easy to compare the picture quality and brightness. What is not immediately evident and expected is the difference between how the set looked in the store and how it looks in your home. HDTV pictures are calibrated by default for large bright stores, as that is where the purchase decision is made. In most cases, the backlight setting for LED HDTVs is set at the factory at its maximum setting for bright display in stores. Many other adjustable settings also affect the quality of the picture – brightness, contrast, sharpness, color, tint, color temperature, picture mode, and other more advanced picture control options like motion mode and noise reduction. While most people simply connect the TV and use out-of-the-box settings, not having been instructed in the past to adjust the TV, it turns out that modern HDTVs need to be calibrated to the room size and typical lighting, which will vary. Simply reducing the backlight can make a huge difference (in my case, I reduced this setting from the peak of 20 down to 6!). Adjusting all the options manually, independently, can be tricky. Luckily there are online recommendations for most TV models for the average room. This can be a good start, but again, each room will be different. Calibration discs and/or professional calibration technicians can help to truly find an optimal setting for your environment, to tweak advanced settings. Wouldn’t it be nice if a TV could calibrate itself to its environment? Perhaps this is not far off, but for now the calibration is a manual process.

Once a TV is calibrated, it is ready to enjoy. The visual data, the broadcasted information, can be observed, processed, and understood in real time. When it comes to data analytics, however, with raw data in the form of numbers, text, images, etc., gathered from sensors and online transactions, ‘seeing’ the information contained within, as the source grows rapidly, is not so easy. Machine learning is a form of self-calibration of predictive models given training data. These modeling algorithms are commonly used to find hidden value in big data. Facilitating effective decision making requires the transformation of relevant data to high-quality descriptive and predictive models. The transformation presents several challenges however. As an example, take a neural network (Figure 1). A set of outputs are predicted by transforming a set of inputs through a series of hidden layers defined by activation functions linked with weights. How do we determine the activation functions and the weights to determine the best model configuration? This is a complex optimization problem.


Figure 1: neural network

The goal in this model training optimization problem is to find the weights that will minimize the error in model predictions given the training data, validation data, specified model configuration (number of hidden layers, number of neurons in each hidden layer) and regularization levels designed to reduce overfitting to training data. One recently popular approach to solving for the weights in this optimization problem is through use of a stochastic gradient descent (SGD) algorithm. The performance of this algorithm, as with all optimization algorithms, depends on a number of control parameters for which no set of default values are best for all problems. SGD parameters include among others a learning rate controlling the step size for selecting new weights, a momentum parameter to avoid slow oscillations, a mini-batch size for sampling a subset of observations in a distributed environment, and adaptive decay rate and annealing rate to adjust the learning rate for each weight and time. See related blog post ‘Optimization for machine learning and monster trucks’ for more on the benefits and challenges of SGD for machine learning.

Figure 2: momentum parameter

Figure 2: momentum parameter

The best values of the control parameters must be chosen very carefully. For example, the momentum parameter dictates whether the algorithm tends to oscillate slowly in ravines where solutions lie, jumping across the ravine, or dives in quickly. But if momentum is too high, it could jump by the solution (Figure 2). The best values for these parameters also vary for different data sets, just like the ideal adjustments for an HDTV depending the characteristics of its environment. These options that must be chosen before model training begins dictate not only the performance of the training process, but more importantly, the quality of the resulting model – again like the tuning parameters of a modern HDTV controlling the picture quality. As these parameters are external to the training process – not the model parameters (weights in the neural network) being optimized during training – they are often called ‘hyperparameters’. Settings for these hyperparameters can significantly influence the resulting accuracy of the predictive models, and there are no clear defaults that work well for different data sets.

In addition to the optimization options already discussed for the SGD algorithm, the machine learning algorithms themselves have many hyperparameters. Following the neural net example, the number of hidden layers, the number of neurons in each hidden layer, the distribution used for the initial weights, etc., are all hyperparameters specified up front for model training that govern the quality of the resulting model.

The approach to finding the ideal values for hyperparameters, to tuning a model to a given data set, traditionally has been a manual effort. However, even with expertise in machine learning algorithms and their parameters, the best settings of these parameters will change with different data; it is difficult to predict based on previous experience. To explore alternative configurations typically a grid search or parameter sweep is performed. But a grid search is often too coarse. As expense grows exponentially with number of parameters and number of discrete levels of each, a grid search will often fail to identify an improved model configuration. More recently random search is recommended. For the same number of samples, a random search will sample the space better, but can still miss good hyperparameter values and combinations, depending on the size and uniformity the sample. A better approach is a random Latin hypercube sample. In this case, samples are exactly uniform across each hyperparameter, but random in combinations. This approach is more likely to find good values of each hyperparameter, which can then be used to identify good combinations (Figure 3).

Figure 3: hyperparameter search

Figure 3: hyperparameter search

True hyperparameter optimization, however, should allow searching between these discrete samples, as a discrete sample is unlikely to identify even a local accuracy peak or error valley in the hyperparameter space, to find good combinations of hyperparameter values. However, as a complex black-box to the tuning algorithm, machine learning training and scoring algorithms create a challenging class of optimization problems:

  • Machine learning algorithms typically include not only continuous, but also categorical and integer variables. These variables can lead to very discrete changes in the objective.
  • In some cases, the space is discontinuous where the objective blows up.
  • The space can also be very noisy and non-deterministic. This can happen when distributed data is moved around due to unexpected rebalancing.
  • Objective evaluations can fail due to grid node failure, which can derail a search strategy.
  • Often the space contains many flat regions – many configurations give very similar models.

An additional challenge is the unpredictable computation expense of training and validating predictive models with changing hyperparameter values. Adding hidden layers and neurons to a neural network can significantly increase the training and validation time, resulting in a wide range of potential objective expense. A very flexible and efficient search strategy is needed.

SAS Local Search Optimization, part of the SAS/OR® offering, is a hybrid derivative-free optimization strategy that operates in parallel/distributed environment to overcome the challenges and expense of hyperparameter optimization. It is comprised of an extendable suite of search methods driven by a hybrid solver manager controlling concurrent execution of search methods. Objective evaluations (different model configurations in this case) are distributed across multiple evaluation worker nodes in a grid implementation and coordinated in a feedback loop supplying data from all concurrent running search methods. The strengths of this approach include handling of continuous, integer, and categorical variables, handling nonsmooth, discontinuous spaces, and easy of parallelizing the search strategy. Multi-level parallelism is critical for hyperparameter tuning. For very large data sets, distributed training is necessary. Even with distributed training, the expense of training severely restricts the number of configurations that can be evaluated when tuning sequentially. For small data sets, cross validation is typically recommended for model validation, a process that also increases the tuning expense. Parallel training (distributed data and/or parallel cross validation folds) and parallel tuning can be managed – very carefully – in a parallel/threaded/distributed environment. This is typically not discussed in the literature or implemented in practice;  typically either ‘data parallel’ or ‘model parallel’ (parallel tuning) is exercised.

Optimization for hyperparameter tuning typically can very quickly lead to several percent reduction in model error over default settings of these parameters. More advanced and extensive optimization, facilitated through parallel tuning to explore more configurations, can lead to further improvement, further refining parameter values. The neural net example discussed here is not the only machine learning algorithm that can benefit from tuning:  the depth and number of bins of a decision tree, number of trees and number of variables to split on in a random forest or gradient boosted trees, the kernel parameters and regularization in SVM and many more can all benefit from tuning. The more parameters that are tuned, the larger the dimensions of the hyperparameter space, the more difficult a manual tuning process becomes and the more coarse a grid search becomes. An automated, parallelized search strategy can also benefit novice machine learning algorithm users.

Machine learning hyperparameter optimization is the topic of a talk to be presented by Funda Günes and myself at The Machine Learning Conference (MLconf) in Atlanta on September 23.  The title of the talk is “Local Search Optimization for Hyperparameter Tuning” and includes more details on the approach, parallel training and tuning, and tuning results.


image credit: photo by kelly // attribution by creative commons


Machine learning fun at KDD

KDD buttons

Who says machine learning can't be fun? A crew of us from SAS went to San Francisco for the recent KDD conference, which bills itself as "a premier interdisciplinary conference, [which] brings together researchers and practitioners from data science, data mining, knowledge discovery, large-scale data analytics, and big data." We brought these buttons with us, and they were a huge hit!

Polly and Simran booth setup2

Polly and Simran setting up the booth

But we weren't at KDD just to have fun, of course. We came to learn and share, in our booth and in many other ways. Simran Bagga came to talk about all things text analytics, and she was nice enough to pitch in and help me set up the booth. Naturally, her favorite button was "I'm Feeling Unstructured Today." She gave two extended demos in the booth: "Combining Structured and Unstructured Data for Predictive Modeling Using SAS® Text Miner" and "Topic Identification and Document Categorizing Using SAS® Contextual Analysis."

Wayne Thompson served as a senior editor on the Review Board, which means he oversaw a group if volunteers who had the hard task of reviewing and making selections from the many excellent papers submitted for the Applied Data Science track. He was also was a panelist in a "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data." His favorite button was "Talk Data to Me," which he did during his panel, "Internet of Things, Industrial Internet, and Instrumented Environments: the Furious Need for Standards." He also gave an extended demo in the SAS booth on "Machine Learning on the Go."

KDD Udo panel

Udo is third from left on this panel

"Can Tools Effectively Unleash the Power of Big Data?" Udo Sglavo thinks so, and he said as much in this panel he was part of in the Applied Data Science Invited Talks track. As someone who has been involved in data mining for many years, Udo's favorite button was "I Support Vector Machines." This button was popular, because it was also Wei Xiao's favorite. He was busy attending many sessions, but he did give his own extended demo in the booth on "A Probabilistic Machine Learning Approach to Detect Industrial Plant Fault."

Patrick in the booth

Patrick in the booth

Susan Haller, who leads teams responsible for data mining and machine learning at SAS, had a different favorite button: "How Random are Your Forests?” Ray Wright on Susan's team's favorite was "You Can Engineer My Features." Ray is interested in automation, too, which was the subject of his extended demo: "Modeling Automation With SAS® Enterprise Miner™ and SAS® Factory Miner." But Ray also focused on basketball, giving a poster in the Large Scale Sports Analytics Workshop on "Shot Recommender System for NBA Coaches," which he co-authored with Ilknur Kaynar Kabul and Jorge Silva. Jorge didn't attend the conference, but Ilknur did, and her favorite button was "I'm Having a Cold Start Today." However, Ilknur was not having a cold start when she presented her extended demo: "Auto-Tuning Your Decision Tree, Random Forest and Neural Networks Models." Another member of Susan's team, Patrick Hall, spent a lot of time in the SAS booth, where he was great at answering all kinds of questions. However, he couldn't decide on his favorite button, because it was a tie between "I’m Feeling Unstructured Today” and “I Support Vector Machines.” Patrick answered a lot of questions on options for integrating open source software with SAS, and this was the topic of his extended demo: "Options for Open Source Integration in SAS® Enterprise Miner™." Also on Susan's team, Taiping He, liked: "I'm Feeling Unstructured Today," which may be a surprise, because his extended demo was: "Distributed Support Vector Machines in SAS® Viya™ System." Guess who develops our SVM procedure in SAS Enterprise Miner?

Scott with a smile on his face, as usual

Scott with a smile on his face, as usual

KDD has a nice balance of practitioners and academics in attendance, so we were glad to interact with both groups. We met many students and professors in the booth, and Scott MacConnell was on hand from our Academic Outreach and Collaborations group to talk about all the great free resources SAS has to offer academics. Scott's favorite button was "I Am Feeling Unstructured Today."

Mural from the wall of the Stinking Rose restaurant

Mural from the wall of the Stinking Rose restaurant

We made time for fun, too, and one night many of us ate dinner together at a restaurant called The Stinking Rose, which calls itself "A garlic restaurant." They had fun murals on the wall showing garlic in all kinds of ways you never even dreamed of! I had the Forty Clove Garlic Chicken, and even though I didn't eat anywhere near that number they provided, I do hope my choice didn't depress traffic in the booth. The food was delicious! And my favorite button? "My Networks Run Deep."





The Internet of medical things and of intern things

#sasinternlifeThe internet of medical things, spurred by the advent of wearable sensors, has dramatic consequences in industry, healthcare, and analytics, just as the advent of the internet of things and analytics has consequences in education. When I began my internship at SAS in May, I knew little about the internet of things, wearable sensors that make up the internet of medical things, and analytics, but I knew I wanted to use data for good, and I knew how to program.

This past summer I used data from cell phones attached at the waist to predict the activity of the owner, which is an exciting application of the internet of medical things. There are a number of immediate applications of this research: contextualizing electrocardiogram signals, improving exercise analysis, and assisting in the care for elderly. As an intern, my first assignment was simply to replicate the results from an existing activity recognition paper using SAS/IML® to extract features from a time series and SAS® Enterprise Miner ™ to produce an accurate model. As I mentioned earlier, I started my summer knowing how to program in a few languages, including SAS, but I didn’t know what a time series of data was, or how to program in IML, and I knew absolutely nothing about how to use a neural network model.

My first obstacle for my summer project on the internet of medical things was fleshing out how I learn best. With respect to SAS Enterprise Miner, at first I spent time going through the documentation to get a feel for the different nodes and their respective settings and options. This was helpful to a point, but what I discovered was that I learned best when I tested different options and examined results. I found this to be true in other parts of my research; when I spent time to plot the time series data, using different graph types, styles, and filters, I was able to understand my data at a deeper level. When extracting features from a time series, it is important to extract intuitive and meaningful features that capture a characteristic of the time series that would be evident if you looked at it in its entirety. This is almost impossible without spending some time examining the data. I think this is a common trend in our new age of data science and analytics; it’s not about what you think the data should say, but about what the data are actually saying.

Throughout this project I observed some characteristics of the internet of medical things, but I also learned what I call the Internet of Intern Things.

1. Read, read, read, and read some more.
On my second day at SAS, my manager and another team member met me for lunch and we discussed some possible projects for my summer experience. I admitted that I didn’t have a particular research interest, but I was willing to try anything. My team member suggested a project and later sent me a folder of recent publications related to the topic. I dutifully saved the folder to my computer and printed the PDFs that seemed the most intuitive and easiest to understand. I read the documents, highlighted what I thought was important, and launched into the project. That was my first mistake. For example, I began searching for a filtering mechanism, because that’s what the project required, but I didn’t intuitively understand my data and its form enough to be able to explain why I needed a filtering mechanism. Days later my mentor asked me some questions about the data, just to be sure we were on the same page, and I realized I wasn’t sure about my answers. Not only was I embarrassed, but I was worried that the time I had spent on my project was wasted. Of course as an intern, no time spent failing is wasted, because failures are learning experiences, but I was nonetheless disappointed. Fortunately for me, my mentor was very understanding. From this experience I learned that taking time to contextualize your data at the beginning of a project is not only helpful, but necessary. Moreover, prioritizing reading papers for research, and papers or articles suggested by colleagues is helpful. Several times throughout the summer, my mentors, with much more experience than me, recommended well-known papers or recent articles that were relevant to my field of study and interests. I learned that it is valuable to take time each week, if not each day, to read a short paper or several articles to stay up to date and informed. As an intern, reading helped me to understand the “buzz words” of my field, like “data science” and “machine learning,” and gave me talking points when I met with my colleagues for lunch. I know, that sounds a little over the top. Like really, “talking points” for lunch? But as an intern it is important to set yourself apart, and being well-read helps.

2. Collaboration is not only helpful, but imperative.
If I were to summarize my summer experience at SAS into one word, it would be “collaboration.” Collaboration was crucial to my summer project, and to navigating such a large organization as an intern. After giving my first presentation of my preliminary work on my summer project, several other interns contacted me and shared their projects, and we found overlaps. While I was working on modeling human activity using feature engineering with a goal of classifying healthy or unhealthy heartbeats, others were working on motif discovery and motif comparison.

elaine standing figure small2

elaine plot

elaine walking figure small


These projects logically overlap in our ultimate motivation: classification of health signals. My project was focused on extracting information from a time series, while others were reviewing the actual pattern of a time series in a pictorial sense. After realizing this overlap, we began to compare notes and share helpful resources for visualization. In my final intern presentation, I actually used a visualization application shared by my fellow intern. Our collaboration not only benefited our summer projects, but it was also in the spirit of the atmosphere of modern tech companies who are concerned with team work and shared work effort. Moreover, it points to the central theme of the internet of things: everything is connected in some way, and thus should be used in tandem for the most efficient and accurate results.

3. Prepare and ask questions.
You know those professors who on the first day of class say “You can never ask enough questions! There are no dumb questions.”? I won’t use this time to reiterate the very important action of asking questions, but instead will add my own flavor. Don’t just ask questions, prepare questions. What do I mean? Exhaust your own resources before you ask for help, but don’t take too long. Continually ask yourself what is confusing before asking for help. Read. Did I say that again? I can’t stress it enough. Don’t get me wrong, I spent countless hours in my teammates’ offices this summer asking some dumb questions, and also asking some questions that took us both a week to answer. But, I believe when you come prepared with specific questions and sources of confusion that teammates are more willing to help and answer questions. During the summer I also had the opportunity to email individuals who published the data I used for my project. In writing that email, it was very important for me to be sure of what I knew before I asked questions. It goes back to the internet of things: What do I know? What do I want to know? What resources of information can I use to learn what I want to know?

My intern experience this summer has impacted my research focus, education plans, and career path. Another amazing opportunity that grew out of my summer experience is presenting a student e-poster at the 2016 SAS Analytics Experience conference in Las Vegas. Besides being able to present my research, I am also very excited about this opportunity because I will be able to hear a talk given by Jake Porway, founder and executive director of DataKind, an organization committed to using “data for good”, along with many other interesting talks, sessions, and demos.

Having an experience at SAS (my own personal Internet of Intern Things) in the middle of my college career was perfect timing. I realized that knowing mathematics, statistics, and computer science are very important, but recognizing the overlaps and interconnectedness of these disciplines is crucial, just as in the internet of things, and as I have found, in the internet of medical things.


Time series machine learning techniques in healthcare

#sasinternlifeTime series machine learning techniques show great promise for the analysis of health care wearable data. As our busy lifestyles render continuous monitoring more and more essential, the need to analyze data to find correlations between these data streams becomes even more important, because they can provide important cues to people. These cues could be as simple as reminding a person to take a walk or move around, which is already being done by a lot of wearables available today, such as Fitbit, Garmin, Nike, etc. However, along with monitoring the current state of an individual, these popular devices are not able to perform the complex predictions that correlate the captured information to make sense at a higher level or provide causal relationships between the data. My research aims to develop advanced algorithms for analyzing time series data for estimation and prediction of physiological parameters (such as heart rate or respiration rate using kinematic and physiological data). My current work is applying time series machine learning techniques for greater insight.

I am currently a graduate student intern in machine learning at SAS and also a research assistant at the center for Advanced Self-Powered Systems of Integrated Sensors and Technologies (ASSIST) at North Carolina State University. The ASSIST Center is a National Science Foundation-sponsored Nanosystems Engineering Research Center (NERC), which means it develops and employs nanotechnology-enabled energy harvesting and storage, ultra-low power electronics, and sensors to create innovative, body-powered, and wearable health monitoring systems. SAS is one of the industry partners for the ASSIST Center, and the insights on real time data analysis from SAS have proven to be really helpful for our research. Our motivation behind this research can be explained through this simple example: suppose an individual has a pre-existing condition like asthma, where the surroundings and their activities could trigger an attack. In such cases predicting respiration rate in advance could be beneficial. For example, if s/he is biking and continues to bike for another 20 mins, his/her predicted respiration rate could help him/her decide if s/he should bike for another 20 mins or reduce the time to stay within healthy levels. The goal is to be able to notify people about these parameters by identifying the right activities, which then become an index to predict the physiological parameters. In my research, I address the problem of identifying activities by creating hierarchical models to learn robust parameters, which is one application of time series machine learning techniques. In the near future we will be able to use these models to then predict respiration rate and heart rate.

There have been numerous studies that make use of supervised learning for activity recognition, using motion capture data and inertial measurements obtained from inertial measurement units (IMU). An IMU is a device that measures and reports linear and angular motion from the body, and one widely-available example is a smartphone. Most of these studies make use of techniques such as feature extraction, clustering and machine learning approaches for classification. Feature extraction techniques range from using statistical moments of the data (e.g., mean, variance, kurtosis) to bag-of-words representations of poses and their temporal differences. Machine learning methods used include support vector machines (SVM), neural networks, and probabilistic graphical models (e.g., hidden Markov models and conditional random fields). There are also approaches using semi-supervised techniques, and even unsupervised techniques that rely on clustering user-defined similarity metrics to identify single activities. However, most of these approaches only work at a fixed scale. That is, they do not capture hierarchies in the activities, which are required to explain complex dependencies between activities. For example, a person’s arm swinging can be part of a simple activity, such as walking, or a complex activity, such as dancing. A two-level hierarchy has been captured through the computation of the so-called motifs that compose activities. Higher level hierarchies may also be essential but have not been carefully studied. The aim of this research is to capture these dependencies using a computationally efficient framework that will provide a robust characterization of the existing hierarchical structures.

Topological tools for high-dimensional data analysis have gained popularity in recent years. These techniques often focus on tracking the homology of a space, which is a group structure that carries information about its connectivity and number of holes. Techniques such as persistent homology have been used for the analysis of point cloud data, quantifying the stability of the features extracted in a computationally efficient way via the use of stability theorems. These techniques have been used in a variety of applications, including the study of shapes in protein, image analysis and speech pattern analysis. For this research project we use topological data analysis to find robust parameters and build hierarchical graphical representations to classify activities.

namitas plotOur approach builds a hierarchical representation of the data streams by comparing segments of data over various window sizes. A graphical model is extracted by first clustering the segments over a fixed window size τ and then connecting clusters with sufficient overlap across τ values. The structure of the hierarchical graphical model depends on a clustering parameter ε. We propose a new methodology for selecting robust graphical structures from this data via the use of an aggregate version of the persistence diagram. We also provide a methodology for selecting parameter values for this representation based on inference performance and power consumption considerations.

From our approach we are able to report the prediction accuracy for each of the activities in our dataset (walking, bicycling, sitting, golfing and waving). We also show how persistence diagrams can help reduce computation time and help choose stable models for our hierarchical representations. Some of the future work will involve testing this method on other datasets and comparing it with other existing algorithms.

I personally am really excited about the advantages the wearable technologies provide us! They are changing the lifestyle of individuals at a personalized level. Coming from a biomedical background, I always wanted to work closely with the wearable devices and understand how they could benefit us to achieve a better and healthy living. Being able to apply time series machine learning techniques from my current studies in electrical engineering to health care wearables leverages my biomedical experience in exciting new ways!

I’ll be presenting this work as an e-poster during the SAS Analytics Experience Conference in Vegas September 12-14, 2016, so look for me if you’re there to learn more!

Editor’s note: Namita was one of six winners of the e-poster competition offered at the Conference, which meant she won a free trip to the event, so be sure to check out her work! This past summer Namita was also a SAS Summer Fellow in Machine Learning, which is a highly selective competitive program SAS offers for PhD students each year. 





Is Poker a Skill Game? A Panel Data Analysis

The annual SAS Analytics Conference is upon us again. This year it is known by a different name, Analytics Experience 2016, but the location, Las Vegas, is the same as it has been the previous two years. Just like last year, I will be attending and presenting on analytics for panel data using SAS/ETS® for econometrics and time series.

While preparing for my trip I was reminded of a paper I once read in Chance magazine (Croson, Fishman and Pope 2008) that concluded that poker, like golf, is a game of skill rather than luck.  The paper was published in 2008 during the heyday of televised poker, when it seemed that ESPN aired poker tournaments and little else.  The paper especially struck me because it quoted one of my favorite movies:

"Why do you think the same five guys make it to the final table of the World Series of Poker every year? What are they, the luckiest guys in Las Vegas?" – Mike McDermott (played by Matt Damon in Rounders)

Upon rereading the paper I realized the datasets the authors gathered followed a design for panel data.

Panel data occur when a set of individuals, or panel, are each measured on several occasions. Panel data are ubiquitous in all fields, because they allow each individual to act as their own control group. That allows you to focus on identifying causal relationships between response and regressor, knowing that you can control for all factors specific to the individual, both measured and unmeasured.

In regards to Croson et al. (2008), the individuals were poker players whose results were recorded over multiple poker tournaments. The authors gathered two panel datasets, one for poker players and one for professional golfers. They surmised that if the associations you see for poker mimic those for golf, then you should conclude that poker, like golf, is a game of skill.  After all, one would never theorize that Tiger Woods has won 14 major championships based purely on good karma.

Focusing on the data for poker, the authors gathered tournament results on 899 poker players. Because poker tournaments vary in the number of entries, only results in the top 18 were considered, and that number was chosen because it corresponds to the final two tables of 9 players each. The response was the final rank (1 through 18, lower being better) and the regression variables were three measures of previous performance. One such measure was experience, a variable indicating whether the player had a previous top 18 finish.

Among other similar analyses, the authors fit a least-squares regression of rank on experience:

\(Rank_{ij} = \beta_{0} + \beta_{1} Experience_{ij} + \epsilon_{ij}\)

where i represents the player and j the player’s ordered top-18 finish.  From the analysis they found a statistically significant negative association between current rank and previous success. Because lower ranks are better, they concluded that good previous performance was associated with good present performance. Furthermore, the magnitude of the association was analogous to the parallel analysis they performed for golf. They concluded that because you can predict current results based on previous performance – in the same way you can with golf – then poker must be a skill game.

The authors used simple least squares regression, with the only adjustment for the panel design being that they calculated "cluster robust’’ standard errors that controlled for intra-player correlation.  They did not consider directly whether there were any player effects in the regression.

After obtaining the data, I used PROC PANEL in SAS/ETS to explore this issue.  I considered three different estimation strategies applied to the previous regression. PROC PANEL compactly summarized the results as follows:

comparison of model parameter estimates

The OLS Regression column precisely reproduces the analysis of Croson et al. (2008) and shows a significant negative association between current rank and previous experience.  The Within Effects column is from a fixed-effects estimation that utilizes only within-player comparisons. You can interpret that coefficient (0.39) as the effect of experience for a given player. Conversely, the Between Effects column is from a regression using only player-level means, that is, the estimator uses only between-player comparisons. Because the estimator of the within effect for experience is not significant and that for the between effect is strongly significant, you can conclude the data exhibit substantial latent player effects. That is not surprising, because measures of player ability (technical, psychological or mystical) weren’t included in the model.

The augmented analysis does nothing to invalidate the Croson el al. (2008) conclusion that poker involves more skill than luck. However, to believe that premise you must begin with the untested (yet reasonable) assumption that luck is something that, even if it plays a factor in one tournament, cannot be maintained over a career. You must rely on common sense and not the data at hand to rule out luck as a latent (and mystical) player ability. With that question settled, the data go on to indicate that luck is not even a factor for single tournaments, each of which can be thought of as a long-run realization of hundreds of poker hands.

The PROC PANEL output merely furthers the point that some poker players (like their golfing counterparts) are just better at their craft than others.

Then again, maybe they really are the luckiest guys in Vegas.

If you are curious to know more about panel data, what’s available in SAS and how it may be applied, you can catch my theater presentation (that’s just a fancy way to say `talk’’), "Modeling Panel Data: Choosing the Correct Strategy," at the SAS Analytics Experience conference September 12-14 in Vegas. I'll be speaking on Wednesday, September 14, 1:15 PM - 2:00 PM. You will not catch me at the poker tables, however. My poker game stinks.



Croson, R., P. Fishman and D. G. Pope. 2008.  Poker Superstars: Skill or Luck? Similarities between golf --- thought to be a game of skill --- and poker.  Chance 21(4): 25-28.

SAS Institute, The PANEL Procedure, SAS/ETS(R) 14.1 documentation

Spatial econometric modeling using PROC SPATIALREG

In our previous post, Econometric and statistical methods for spatial data analysis, we discussed the importance of spatial data. For most people, understanding that importance is relatively easy because spatial data are often found in our daily lives and we are all accustomed to analyzing them. We can all relate to the first law of geography—“Everything is related to everything else, but near things are more related than distant things”—and we can agree that our interaction with close things around us plays an important role in our decision process. Applications of spatial data in our daily lives are often seamless, and you could argue that we are all spatial statisticians and econometricians without even realizing it. Although most human beings have an innate ability to incorporate spatial information, computer-based analytics need to be given tools to include such information in their analyses. SAS/ETS 14.2 introduces one such tool, the SPATIALREG procedure, which enables you to include spatial information in the analysis and improve the econometric inference and statistical properties of estimators.

In this post, we discuss how you can use the SPATIALREG procedure to analyze 2013 home value data in North Carolina at the county level. The five variables in the data set are county (county name), homeValue (median value of owner-occupied housing units), income (median household income in 2013 in inflation-adjusted dollars), bachelor (percentage of people with bachelor’s degree or higher who live in the county), and crime (rate of Crime Index offenses per 100, 000 people). The data for home values, income, and bachelor’s degree percentages in each county were obtained from the website of the United States Census Bureau and computed using the 2009–2013 American Community Survey five-year estimates. Data for crime were retrieved from the website of North Carolina Department of Public Safety. For the purpose of numerical stability and interpretation, all five variables are log-transformed during the process of data cleansing. We use this data set to demonstrate the modeling capabilities of the SPATIALREG procedure and to understand the impact of household income, crime rate, and education attainment on home values.

As a preliminary data analysis, we first show a map of North Carolina that depicts the county-level home values in Figure 1. It is easy to see that the home values tend to be clustered together. Higher values are found in the coastal, urban, and mountain areas of North Carolina and lower home values can be found in rural areas. Home values of neighboring counties more closely resemble each other than home values of counties that are far apart.

Figure 1: Median value of owner-occupied housing units

Figure 1: Median value of owner-occupied housing units

From a modeling perspective, findings from Figure 1 suggest that the data might contain a spatial dependence, which needs to be accounted for in the analysis.  In particular, an endogenous interaction effect might exist in the data—home values tend to be spatially correlated with each other. PROC SPATIALREG enables you to analyze the data by using a variety of spatial econometric models.

Table 1: parameter estimates for a linear regression model

Table 1: parameter estimates for a linear regression model

To lay the groundwork for discussion, you can start the analysis with a linear regression. For this model, the value of Akaike’s information criterion (AIC) is –106.12. The results of parameter estimation from a linear regression model, shown in Table 1, suggest that three predictors—income, crime, and bachelor—are all significant at the 0.01 level. Moreover, crime exerts a negative impact on home values, indicating that high crime rates reduce home values. On the other hand, both income and bachelor have positive impacts on home values.

Figure 2 provides the plot of predicted homeValue from the linear regression model. Although the comparison of Figure 1 and Figure 2 might suggest that predicted homeValue from the linear regression model captures the general pattern in the observed data, you need to be careful about some underlying assumptions for linear regression. Among those assumptions, a critical one is that the values of the dependent variable are independent of each other, which is not likely for the data at hand. As a matter of fact, both Moran’s I test and Geary’s C test suggest that there is a spatial autocorrelation in homeValue at the 0.01 significance level. Consequently, if you ignore the spatial dependence in the data by fitting a linear regression model to the data, you run the risk of false inference.


Figure 2: predicted median value of owner-occupied housing units using a linear regression model

Figure 2: predicted median value of owner-occupied housing units
using a linear regression model

Because of the spatial dependence in homeValue, a good candidate model to consider might be a spatial autoregressive (SAR) model for its ability to accommodate the endogenous interaction effect.  You can use PROC SPATIALREG to fit a SAR model to the data. Before you proceed with model fitting, you need provide a spatial weights matrix. Generally speaking, a spatial weights matrix summarizes the spatial neighborhood structure; entries in the matrix represent how much influence one unit exerts over another.

Table 2: parameter estimates for a SAR model

Table 2: parameter estimates for a SAR model

The spatial weights matrix specification is of vital importance in spatial econometric modeling. Despite many different ways of specifying such a matrix, results can be sensitive to the choice of a spatial weights matrix.  Without delving into the nitty-gritty of such choice, you can simply define two counties to be neighbors of each other if they share a common border. After creating the spatial weights matrix, you can feed it into PROC SPATIALREG and run a SAR model. Table 2 presents the results of parameter estimation from a SAR model.

For this model, the value AIC is –110.79. The regression coefficients that correspond to income, crime, and bachelor are all significantly different from 0 at the 0.01 level of significance. Both income and bachelor exhibit a significantly positive short-run direct impact on home values. In contrast, crime rate shows a significantly negative short-run direct impact on home values. In addition, the spatial autoregressive coefficient ρ is significantly different from zero at 0.01 level, suggesting that there is a significantly positive spatial dependence in home values.

Figure 3 shows the predicted values for homeValue from the SAR model. Comparing Figures 1 and 3 suggest that the fitted home values capture the trend in the data reasonably well.

Figure 3: predicted median value of owner-occupied housing units using a SAR model

Figure 3: predicted median value of owner-occupied housing units using a SAR model

In this post, we introduced the SPATIALREG procedure, fit a SAR model, and compared predicted values from the SAR model to those from linear regression. Even though the SAR model presented an improvement over the linear model in terms of AIC, many other models are available in the SPATIALREG procedure that might provide even more desirable results and more accurate predictions. These models include the spatial Durbin model (SDM), spatial error model (SEM), spatial Durbin error model (SDEM), spatial autoregressive confused (SAC) model, spatial autoregressive moving average (SARMA) model, spatial moving average (SMA) model, and so on. In the next post, we will discuss their features and show you how to select the most suitable model for the home value data set. We will also be giving a talk, "Location, Location, Location! SAS/ETS® Software for Spatial Econometric Modeling," at the SAS Analytics Experience conference September 12-14, 2016 in Las Vegas, so stop by and let's talk spatial!

This post was co-written with Jan Chvosta.


The benefits of artificial intelligence

Briggs and Riley rolling suitcase ad2

Photo courtesy of U.S. Luggage , Briggs & Riley

Asking about the benefits of artificial intelligence and machine learning reminds me a little of the transition to suitcases with wheels. Do you remember lugging around those old suitcases? If not, good for you - this original advertisement from US Luggage will take you back! Thank Bernard Sadow for persistence with his idea to add wheels, because when he pitched his idea people thought he was crazy. Surely no one would want to pull their own suitcase? His patent application stated, “Whereas formerly, luggage would be handled by porters and be loaded or unloaded at points convenient to the street, the large terminals of today, particularly air terminals, have increased the difficulty of baggage-handling….Baggage-handling has become perhaps the biggest single difficulty encountered by an air passenger.”

We can wheel our own suitcases these days, but baggage handling is still a challenge for airlines. One of the benefits of artificial intelligence and machine learning is improvements companies like Amadeus are applying to baggage handling in airports to reduce the risk of lost bags. And to improve the overall customer experience moving through the Frankfurt Airport Fraport uses predictive modeling from SAS, part of the extensive set of machine learning capabilities from SAS.

bank tellerI hear plenty of verbal and online chatter predicting that artificial intelligence and machine learning will eliminate jobs. But a review of history shows that many such past predictions have not come true. Remember the introduction of ATMs? The expectation was that bank tellers would become an anachronism, but in fact demand for tellers has increased greater than average. Automation reduced the number of tellers needed per bank, but this savings allowed banks to open new branches, thus stimulating demand for tellers.

The same pattern repeated with the introduction of grocery store scanners and cashiers and electronic document discovery and paralegals. Today your friendly bellhop still greets you at the hotel as you roll your suitcase to the entrance because in fact the US Bureau of Labor Statistics predicts average growth in demand for baggage porters and bellhops. I believe that the benefits of artificial intelligence and machine learning include increased productivity that will lead to job creation. Plenty of enthusiastic electronic ink has been spilled about the benefits of artificial intelligence and machine learning for business, so I’m going to focus on another reason why I’m excited about this field – the public benefit in areas like our health, economic development, the environment, child welfare, and public services.

Machine learning and artificial intelligence help use data for good

In a blog post on LinkedIn, Microsoft CEO Satya Nadella envisions a future where computers and humans work together to address some of society’s biggest challenges. Instead of believing computers will displace humans, he argues that at Microsoft “we want to build intelligence that augments human abilities and experiences.” He understands the trepidation some have about jobs and even the supposed Singularity (the idea that machines will run amok and take over), writing “…we also have to build trust directly into our technology,” to address privacy, transparency and security. He cites an example of the social benefits of machine learning and artificial intelligence in the form of a young Microsoft engineer who lost his sight at an early age but who works with his colleagues to build what is essentially a mini-computer work like glasses to give him information in an audible form he can consume.

xray tumorNadella's example of his young colleague is one of many where machine learning and artificial intelligence are making fantastic advances in providing great help for people with disabilities in the form of various health care wearables and prosthetics. Health care is replete with examples, as deep learning and other techniques show rapid gains on humans for diagnosis. For example, the deep learning startup, Enlitic, makes software that in trials is 50% more accurate than humans in classifying malignant tumors, with no false-negatives (i.e. saying that scans show no cancer when in fact there is malignancy) when tested against three expert human radiologists (who produced false-negatives 7% of the time). In the field of population health management AICure makes a mobile phone app that increases medication adherence among high-risk populations using facial recognition and motion detection. Their technology makes sure that the right person is taking the right medication at the right time.

There are nonprofits that have been drawn to the benefits of artificial intelligence and machine learning, such as DataKind, which “harnesses the power of data science in the service of humanity.” In a project with the nonprofit GiveDirectly, DataKind volunteers worked on an algorithm to classify satellite images to identify the poorest households in rural villages in Kenya and Uganda. A team from SAS is working with DataKind and the Boston Public Schools to improve transportation for their students, using optimization. Thorn: Digital Defenders of Children, uses technology and innovation to fight child sexual exploitation. Much of the trafficking is done online, so analysis of chatter, images, and other data can aid in identifying children and the predators.

elephant uganda queen eliz parkTrafficking in elephant ivory leads to an estimated 96 elephant deaths every day, but a machine learning app is helping wildlife patrols predict the best routes to track poachers. The app drew on 14 years of poaching data activity, produces routes that are randomized so poachers can be foiled, and learns from new data entered. So far its routes have outperformed those by previous ranger patrols. Protection Assistant for Wildlife Security (PAWS) was developed by Milind Tambe, a professor from the University of Southern California, based on security game theory. Tambe has also built these kinds of algorithms for federal agencies like Homeland Security, the Transportation Security Administration, and the Coast Guard to optimize the placement of staff and surveillance to combat smuggling and terrorism.

Machine learning and artificial intelligence in the public sector

nypdOther public sector organizations also realize the benefits of artificial intelligence and machine learning. The New York Police Department has developed the Domain Awareness System, which uses sensors, databases, devices, and more, along with operations research and machine learning, to put updated information in the hands of cops on the beat and at the precincts. Delivering this information even faster than the dispatchers means cops are better prepared when they arrive on the scene. Teams from the University of Michigan’s Flint and Ann Arbor campuses are working together with the City of Flint to use machine learning and predictive algorithms to predict where lead levels are highest and build an app to help both residents and city officials with resources to better identify issues and prioritize responses. It took a lot of work to gather all the disparate information together, but interestingly their initial findings indicate that the troubles are not in the lines themselves but in individual homes, although the distribution of the problems doesn’t cluster like you’d expect.

These are just a few of the many examples of the social benefits of artificial intelligence and machine learning, but they illustrate why I’m excited about their potential to improve our society. Automation fueled by artificial intelligence is likely to result in what economists call "structural unemployment," when there is a mismatch between the skills some workers have and those the economy demands, typically a result of technological change. This disruption is undoubtedly devastating for those who lose their jobs, and I believe as a society we have an obligation to provide workforce development programs and training to help those impacted shift to new skills. But I am hopeful that machine learning will be able to offer help to those disrupted by these changes.

And it may even offer job opportunities. SAS is working with our local Wake Technical Community College, which has launched the nation's first Associate's Degree in Business Analytics, fueled in part by a grant from the US Trade Adjustment Assistance Community College and Career Training initiative. They will also offer a certificate program aimed at displaced or underemployed workers will be targeted and required to earn 12 credit hours to gain a certificate of training. While these graduates will not likely start off doing machine learning, they may move in that direction, and at a minimum contribute to teams that do use these methods.

And LinkedIn uses machine learning extensively, for recommendations, image analysis, and more, but through their Economic Graph and LinkedIn for Good initiatives the company aims to connect talent to opportunities by filling in gaps in skills. In partnership with the Markle Foundation their new LinkedIn Cities program offers training for middle skill workers, those with a high school diploma and some college but no degree, and is piloting in Phoenix and Denver. The combination of online and offline tools with connections to educators and employers will help these individuals improve their opportunities.

Boston Public Schools busSAS will highlight the data for good movement at our upcoming Analytics Experience conference in Las Vegas September 12-14. Jake Porway, the Founder and Executive Director of DataKind, will be one of the keynote speakers. My colleague Jinxin Yi will be giving a super demo on the SAS/DataKind project I mentioned that aims to improve transportation for the Boston Public Schools. His session is one of several that have been tagged in the program as Data for Good sessions. We’ll have a booth where you can learn more and get engaged with #data4good. Stop by and say hi to me if you're there!

Suitcase image credit: Photo courtesy of U.S. Luggage, Briggs & Riley
Bank teller image credit: photo by AMISOM Public Information // attribution by creative commons
Xray image credit: photo by Yale Rosen // attribution by creative commons
Elephants image credit: photo by Michele Ursino // attribution by creative commons
NYPD image credit: photo by Justin Norton // attribution by creative commons
Bus image credit: photo by ThoseGuys119 // attribution by creative commons

Machine learning applications for NBA coaches and players

Machine learning applications for NBA coaches and players might seem like an odd choice for me to write about. Let us get something out of the way: I don’t know much about basketball. Or baseball. Or even soccer, much to the chagrin of my friends back home in Europe. However, one of the perks of working in data science and machine learning is that I can still say somewhat insightful things about sports, as long as I have data. In other words, instant expertise! So with that expertise I’ll weigh in to offer some machine learning applications for basketball.

During a conversation with my good colleague Ray Wright, who does know quite a bit about basketball and had been looking at historical data from NBA games, we suddenly realized something about player shooting. There are dozens of shot types, ranges and zones… and no player ever tries them all. What if we could automatically suggest new shot combinations appropriate for each individual player? Who knew there could be machine learning applications for the NBA?

Such a system that suggests actions or items to users is called a recommender system. Large companies in retail and media regularly use recommender systems to suggest movies, songs and other items to users based on their behavior history, as well as that of other similar users, so you’ve liked used such a system from Amazon, Netflix, etc.In basketball terms, the users are the players, and the items are shot types. As with the other domains mentioned above, available data does not even come close to covering all possible combinations, in this case for players and shots. When the available data matches this scenario it is called sparse. And fortunately, SAS has a new offering, SAS® Viya™ Data Mining and Machine Learning, that includes a new method specifically designed for sparse predictive modeling: PROC FACTMAC, for Factorization Machines.

Let me quickly introduce you to factorization machines. Originally proposed by Steffen (Rendle, 2010), they are a generalization of matrix factorization that allows multiple features with high cardinality (lots of unique values) and sparse observations. The parameters of this flexible model can be estimated quickly, even in the presence of massive amounts of data, through stochastic gradient descent, which is is the same type of optimization solver behind the recent successes of deep learning.

Factorization Machines return bias parameters and latent factors, which in this case can be used to characterize players and shot combinations. You can think of a player’s bias as the overall propensity to successfully score, whereas the latent factors are more fine-grained characteristics that can be related to play style, demographics and other information.

Armed with this thinking, our trusty machine learning software from SAS, and some data science tricks up our sleeves, we decided to try our hand at machine learning applications in the form of automated basketball coaching (sideline yelling optional!). Before going into our findings, let’s take a look at the data. We have information about shots taken during the 2015-2016 NBA basketball season through March 2016. A total of 174,190 shots were recorded during this period. Information recorded for each shot includes the player, shot range, zone, and style (“action type”), and whether the shot was successful. After preprocessing we retained 359 players, 4 ranges, 5 zones, and 36 action types.

And here is what we found after fitting a factorization machine model. First, let’s examine some established wisdom- does height matter much for shot success? As the box and whisker plot below shows, the answer is yes, somewhat, but not quite as much as one would think. The figure depicts the distribution of bias values for players, grouped by their height. There is a bump for the 81-82 inch group, but it is not overwhelming. And it decays slightly for the 82-87 inch group.

NBA box and whiskers plot

Now look at the following figure, which shows made shots (red) vs missed (blue), by location in the court and by action type. There is definitely a very significant dependency! Now if only someone explained to me again what a “driving layup” is…

NBA shots

Let us investigate the biases again, now by action type. The following figure shows the bias values in a horizontal bar plot. It is clear that all actions involving “dunk” lead to larger bars, corresponding to greater probability of success.

NBA bias values

What about other actions? What should we recommend to the typical player? That is what the following two tables show.

NBA most recommended shots

Most recommended shots

Least recommended shots

Least recommended shots

Based on the predicted log-odds, the typical player should strive for dunk shots and avoid highly acrobatic and complicated actions, or highly contested ones such as jump shots. Now, of course not all players are “”typical.” The following figure shows a 2D embedding of the fitted factors for players (red) and actions (blue). There is significant affinity between Manu Giobili and driving floating layups. Players Ricky Rubio and Derrick Rose exhibit similar characteristics based on their shot profiles, as do Russell Westbrook and Kobe Bryant, and others. Also, dunk shot action types form a grouping of their own!

NBA fitted factors

Overall, our 25-factor factorization machine model is successful in predicting log-odds of shot success with high accuracy: RMSE=0.929, which outperforms other models such as SVMs. Recommendations can be tailored to specific players, and many different insights can be extracted. So if any NBA coaches or players want to call about our applications of machine learning for basketball we are available for consultation!

We are delighted that this analysis has been accepted for presentation at the 2016 KDD Large-Scale Sports Analytics workshop this Sunday, August 14, where Ray will be representing our work with this paper: "Shot Recommender System for NBA Coaches." And my other colleague (and basketball fan), Brett Wujek, will be giving a demo theater presentation on “Improving NBA Shot Selection: A Matrix Factorization Approach” at the SAS Analytics Experience Conference September 12-14, 2016 in Las Vegas.

Surely, many basketball experts will be able to give us good tips to augment our applications of machine learning for the NBA. One thing is certain, though when in doubt, always dunk!


Rendle, S. (2010). Factorization Machines. Proceedings of the 10th IEEE International Conference on Data Mining (ICDM).


Multi-echelon inventory optimization at a major durable goods company

Multi-echelon inventory optimization is ever more a requirement in this era of globalization, which is both a boon and bane for manufacturing companies. Optimizing pricing is also important. Global reach allows these companies to expand to new territories but at the same time increases the competition on their home turf. Consider General Motors (GM), which faces new competition at home from Asian manufacturers such as Hyundai and Kia but has also benefited by global expansion. In 2015, they sold 63% of their vehicles outside North America. In fact, GM and their joint ventures sold as many vehicles in China as in North America. The supply chain of most manufacturing companies has also expanded globally, because most of them source their raw materials from different parts of the world, making their supply chain more complex, which is why a holistic approach to optimize the entire supply chain is critical. However, customers still expect the same great service despite the complexity, which is why multi-echelon inventory optimization can provide real benefits to global companies, particularly if they also optimize their pricing.

A major durables goods company in this situation partnered with SAS to leverage our advanced analytics capabilities to improve their profitability and right-size their inventory. The company built a new pricing platform that helps them design data-driven pricing and promotion strategies. SAS® Demand-Driven Planning and Optimization provided this company with a structured and efficient process to right-size the inventory in their complex, multi-echelon supply chain network. This new adaptive platform uses SAS/OR, SAS Visual Analytics and SAS Office Analytics to provide a scalable solution for multi-echelon inventory optimization.

The durable goods company sells their products to end consumers through retailers, who in turn represent their customers. The transactions between the company and retail customer are called invoice data, and the transactions between retail customers and end consumers are called retail sales data. The retail sales data is ten times larger than invoice data. The ability to visualize these transaction data and identify patterns and areas of improvement from them is critical to designing a successful pricing strategy.

Process flowThe new platform consists at its heart of a powerful server that can easily be scaled to the growing needs of the durable goods company. SAS/ACCESS engines enable easy connections to a number of different data sources, such as .NET databases or ODBC connections. The results from this computing environment can be surfaced on the web using SAS Visual Analytics or through desktop applications such as Microsoft Excel, Word, or PowerPoint using SAS Office Analytics.

The new pricing process consist of three simple steps:

Varun chart 2.jpg

First step: identify areas of improvement

Two primary areas of improvement for this company were related to profitability and market share. Using SAS Visual Analytics it is possible to visualize data using heat maps, waterfall charts, time series plots to quickly identify the customers and products that need improvement. Below is an example of how to easy it is to identify areas of improvement using a SAS Visual Analytics report in the form of a Pareto chart. The X axis shows groupings of customers, the bar chart shows the net sales by customer, and the green line indicates the profitability of these customers for the manufacturer. The customers are shown in descending order by their net revenue. The largest customers are shown on the left side of the Pareto chart and the smallest customers on the right side. Due to economies of scale and the buying power they have, largest customers are expected to provide the lowest profit margin percentage to the manufacturer. So in an ideal scenario, the profitability line should increase monotonically from left to right. In the chart below Cust3, Cust5 and Cust9 are clear outliners whose profitability can be improved.

Varun chart 3

Second step: design strategies

The second step involves designing and simulating different pricing strategies to overcome the challenges identified in step 1. After evaluating the different strategies, we selected the best strategy for implementation. Consider Cust2 from the chart above as we break down and analyze the cost components of Cust2 in detail using a waterfall chart. The first bar indicates the gross price. The second bucket indicates the gross to net (G2N) discounts that are offered to customers to increase the sales. If you are an expert, then you can immediately spot that 13% is very high, because you typically offer G2N discounts between 4% and 10%. So in this case the solution to improve the profitability is to reduce the G2N discounts.

Varun chart 4

Lowering the G2N discounts will certainly lower sales volume. Win rate curves help identify the volume changes as a result of price changes. Using SAS/OR you can derive a piecewise linear equation to describe the cumulative historical win rate at different price points. You can then use this analytical model to calculate the volume changes due to price changes. For example, let us consider a product with selling price of $300/item and average cost of $200/item. Current annual sales volume is 376,000 (win rate = 95%) and annual earnings before interest & taxes (EBIT) is $5.6 million.

Varun chart 5

If we were to increase the price to $370, then the win-rate drop to 82%. The new expected annual sales volume is 325,000 and annual EBIT = $27.5 Million. The sales volume could drop significantly if you increase the prices too high. For example if you were to increase the price to $580, then win-rate will drop to only 18% which might be unacceptable as most companies would like to maintain a minimum market-share. SAS/OR can be used to solve this optimization problem to identify the price of the product that maximizes profitability while satisfying constraints such as market share requirements.

Varun chart 6Third step: execute and monitor

In the third step, we implement strategies that we identified in step 2, and importantly we monitor the performance in using Visual Analytics report. The real time monitoring enables you to refine the strategy after it goes live to make sure that we achieve the goals outlined in step 2.

Varun chart 7

Inventory Optimization

The durable goods company implemented SAS® Inventory Optimization Workbench to right-size the inventory and achieve multi-echelon inventory optimization across their complex supply chain network. The SAS inventory optimization process consists of four steps:

Varun chart 8.jpg

In the first step, we fit identify the probability distribution that best fits the forecasted demand. The SAS Inventory Optimization Workbench (IOW) support a number of continuous distributions, such as normal, log normal and discrete distributions like Poisson and binomial. One of the key differentiators of SAS IOW is its ability to handle intermittent demand. In most supply chain, top 20% of the items account 80% of the total demand. There are lot of items with low demand with high variance. SAS IOW uses a specialized technique called Croston method to handle these intermittent demand items. In the Croston method, demand is estimated through exponential smoothing and the interval between demands is estimated separately. In addition we utilize historical performance to estimate the right amount of safety stock that would satisfy the service level requirements.

The second step is to calculate the inventory target or order-up-to-level. The inventory target consists of three components: pipeline stock, cycle stock, and safety stock. Pipeline stock is the amount of inventory required to cover the demand during lead time while cycle stock is the inventory required to cover the period between replenishment. Safety stock is the amount of inventory required to cover the uncertainty in demand. The variance of forecast data and the desired service level for a given product-location pair are the main drivers for the safety stock calculation.

In the third step, we simulate 200 scenarios of forecasted demand over the horizon and calculate KPIs, such as on hand inventory, service level, backlog, and replenishment orders using Monte Carlo Simulation. In the final step, we update the Visual Analytics report using the output from the SAS Inventory Optimization Workbench. Users can easily visualize inventory target, forecasted demand, and project orders over time and use these reports to spot any outliers.

Varun chart 9.jpg

Inventory is a significant investment for every company. So it is critical for supply chain team to estimate the benefits from inventory optimization so that they can set the right expectation with their management team. In addition there are many parameters that needs to be fine-tune periodically to optimize the system performance. SAS team developed a new simulation-optimization approach called Tuning and Validation to overcome these challenges. In the first step, called tuning, we use historical data to automatically calibrate the parameters of the inventory optimization. In second step, we simulate using the optimized policy and compare it to the historical performance to quantify the improvements in KPIs such as inventory cost, backlog, service level etc. as a result of inventory optimization.

Varun chart 10.jpg.pngConclusion

For this durable goods manufacturer, we saw the service level improve from 65% to 92% and backorders reduce from 8% to 2% in 10 months after implementation by using multi-echelon inventory optimization. In addition, with the implementation of the new analytical pricing platform, their analysts can design their pricing and promotion strategies with the click of a button. They can monitor their performance in real-time with visualization. With SAS Demand-Driven Planning and Optimization they have an enhanced, integrated platform for demand planning and multi-echelon inventory optimization, eliminating reliance on multiple Excel-based processes. To learn more about this project you may wish to read this SAS Global Forum 2016 paper, "Leveraging Advanced Analytics in Pricing and Inventory Decisions at a Major Durable Goods Company."


Econometric and statistical methods for spatial data analysis

We live in a complex world that overflows with information. As human beings, we are very good at navigating this maze, where different types of input hit us from every possible direction. Without really thinking about it, we take in the inputs, evaluate the new information, combine it with our experience and previous knowledge, and then make decisions (hopefully, good informed decisions). If you think about the process and types of information (data) we use, you quickly realize that most of information we are exposed to contains a spatial component (a geographical location), and our decisions often include neighborhood effects. Are you shopping for a new house? In the process of choosing the right one, you will certainly consider its location, neighboring locations, schools, road infrastructure, distance from work, store accessibility, and many other inputs (Figure 1). Going on a vacation abroad? Visiting a small country with a low population will probably be very different from visiting a popular destination surrounded by densely populated larger countries. All these are examples that illustrate the value of econometric and statistical methods for spatial data analysis.

Econometric and statistical methods for spatial data analysis

Figure 1: Median Listing Price of Housing Units in U.S. Counties (Source: http://www.trulia.com/home_prices/; retrieved on June 6th 2016)

We are all exposed to spatial data, which we use in our daily lives almost without thinking about it. Not until recently have spatial data become popular in formal econometric and statistical analysis. Geographical information systems (GIS) have been around since the early 1960s, but they were expensive and not readily available until recently. Today every smart phone has a GPS, cars have tracking devices showing their locations, and positioning devices are used in many areas including aviation, transportation, and science. Great progress has also been made in surveying, mapping, and recording geographical information in recent years. Do you want to know the latitude and longitude of your house? Today that information might not be much further away than typing your address into a search engine.

Thanks to technological advancement, spatial data are now only a mouse-click away. Though their variety and volume might vary, data of interest for econometric and statistical methods for spatial data analysis can be divided into three categories: spatial point-referenced data, spatial point-pattern data, and spatial areal data.  The widespread use of spatial data has put spatial methodology and analysis front and center. Currently, SAS enables you to analyze spatial point-referenced data and spatial point-pattern data with the KRIGE2D and VARIOGRAM procedures and spatial point-pattern data with the SPP procedure, all of which are in SAS/STAT. The next release of SAS/ETS (version 14.2) will include a new SPATIALREG procedure that has been developed for analyzing spatial areal data. This type of data is the focus of spatial econometrics.

Spatial econometrics was developed in the 1970’s in response to the need for a new methodological foundation for regional and urban econometric models. At the core of this new methodology are the principles on which modern spatial econometrics is based. These principles essentially deal with two main spatial aspects of the data: spatial dependence and spatial heterogeneity. Simply put, spatial econometrics concentrates on accounting for spatial dependence and heterogeneity in the data under the regression setting. This is important because the ignorance of spatial dependence and heterogeneity could lead to biased/inefficient parameter estimates or flawed inference. Unlike standard econometric models, spatial econometric models do not assume that observations are independent. In addition, the quantification of spatial dependence and heterogeneity is often characterized by the proximity of two regions, which is represented by a spatial weights matrix in spatial econometrics. The idea behind such quantification resonates with the first law of geography—“Everything is related to everything else, but near things are more related than distant things.”

In spatial econometric modeling, the key challenge often centers on how to choose a model that well describes the data at hand. As a general guideline, model specification often starts with understanding where spatial dependence and heterogeneity come from, which is often problem-specific. Some examples of such problems are pricing policies in marketing research, land use in agricultural economics, and housing prices in real estate economics. As an example, it is likely that car sales from one auto dealership might depend on sales from a nearby dealership either because the two dealerships compete for the same customers or because of some form of unobserved heterogeneity common to both dealerships. Based on this understanding, you proceed with a particular model that is capable of addressing the spatial dependence and heterogeneity that the data exhibit. Following that, you revise the model until you identify one that meets certain criteria, such as Akaike’s information criterion (AIC) or the Schwarz-Bayes criterion (SBC). Three types of interaction contribute to spatial dependence and heterogeneity: exogenous interaction, endogenous interaction, and interaction among the error terms. Among a wide range of spatial econometric models, some are known to be good for one type of interaction effect, whereas others are good for other alternatives. If you don’t choose your model properly, your analysis will provide false assurance and flawed inference about the underlying data.

In the next blog post, we’ll talk more about econometric and statistical methods for spatial data analysis by discussing spatial econometric analysis that uses the SPATIALREG procedure. In particular, we’ll discuss some useful features in the SPATIALREG procedure (such as parameter estimation, hypothesis testing, and model selection), and we’ll demonstrate these features by analyzing a real-world example. In the meantime, you can also read more in our 2016 SAS Global Forum paper, How Do My Neighbors Affect Me? SAS/ETS® Methods for Spatial Econometric Modeling.