A note from Udo Sglavo: At SAS, what we deliver to our customers is a product of creative minds thinking differently, challenging the norm, taking risks, and learning from trial and error (The greatest teacher, failure is). For the return of World Creativity & Innovation Week, we want to share some of our recent innovations and collaborations from Advanced Analytics R&D. I'm thrilled to continue my discussions with Michelle Opp, Principal Operations Research Specialist, Arash Banadaki, Sr. Research Statistician Developer, Gunce Walton, Manager of Econometric Modeling, and Javier Delgado, Sr. Research Statistician Developer.
The force awakens: using optimization for configuration test planning
Udo: Tell me a little bit about the background on this SAS Viya project?
Michelle: SAS has an extensive testing matrix that consists of different combinations of cloud providers, Kubernetes versions, SAS cadences, and so on. To declare that we support a particular configuration, it needs to be tested. But testing each configuration takes time and money, neither of which is an unlimited resource. We face an almost insurmountable task in terms of time and budget to support the matrix required by product management. The SAS Analytics Center of Excellence has recently begun working with the SAS DevOps Division to develop a plan to help them optimize their testing configurations to make the best use of their resources.
Initially, this was presented as a possible Design of Experiments problem. Which subset of testing configurations should they choose from among the entire pool of possible testing configurations? But when we began our initial conversations, we realized that they wanted to test all of the supported configurations. However, they needed them staggered over time since they didn’t have the budget or time to test everything every month. We then realized this would be a natural fit for an optimization problem. The optimization model will help us determine which configurations should be tested each month over a specified future horizon. For example, we might produce an optimal testing schedule for the next six months for planning purposes. Still, the optimization model can be rerun regularly to adapt to changing information and priorities dynamically. This will allow the team to adjust their testing schedule as needed.
Udo: Have you run into any challenges to get this project off the ground?
Michelle: Perhaps the most complicated part of this project is not the optimization model itself but the creation of the input data to estimate the value of testing a particular configuration for a specific month. For example, what is the relative value of testing Microsoft Azure versus Amazon Web Services? This will require collaboration with several teams to draw upon their expertise. Together we hope to come up with meaningful input data that can be used to drive the optimization model in a way that produces actionable results. The optimization model can also include constraints to accommodate any business rules that must be met. An example would be partner agreements requiring us to support a particular configuration by a specific date.
Udo: What kind of innovative techniques are you planning on using?
Michelle: From a purely technical standpoint, we are examining the innovative techniques we will need to use. But from a process innovation standpoint, we’re very excited to work on this project because it’s always rewarding to use our tools to solve our problems at SAS. We’re often so focused on solving our customers’ optimization problems that it’s easy to forget that SAS has plenty of internal processes and business decisions that could also greatly benefit from using optimization.
The rise of GPU compute
Udo: Why is enabling GPU computation important for SAS IML users?
Arash: Customers use SAS IML software to write custom programs in a high-level interactive matrix language. The language offers hundreds of functions, including many linear algebra operations. Some of the computations are computationally expensive for large matrices. Graphics processing units (GPUs) can perform linear algebra operations incredibly quickly. We identified several functions in SAS IML that benefit from GPU computing and enabled programmers to use GPUs for faster computations with minimal change to their existing programs. The GPU enhancements are available in PROC IML and the iml Action in SAS Viya.
Udo: I was told that the computational results are impressive – how so?
Arash: That's right, the results are very impressive. GPUs are very good at number crunching, especially for large matrices. For example, a typical matrix computation is solving an extensive system of equations. This computation might take several minutes for huge matrices if you use a single-threaded algorithm on a CPU. However, the exact computation on a GPU might run 100 times faster.
Udo: What different teams collaborated on this project?
Arash: We collaborated with the SAS Analytics Foundation group. That group is responsible for developing a library in SAS that supports multithreaded and GPU-accelerated operations in linear algebra. The SAS IML team uses the library to accelerate linear algebra operations. In addition, in coordination with our testing team, the SAS Servers and Services team was instrumental in deploying GPU support to compute servers in Kubernetes environments. These collaborations will benefit any SAS IML program that runs in SAS Viya.
Udo: Why do you consider your project innovative?
Arash: GPUs have become cheaper over the last few years and are available on most cloud computing services. Accordingly, SAS wants to take advantage of parallel processing on GPUs. The deep learning action and a handful of other SAS Viya actions already use GPUs. Still, SAS IML is the first traditional SAS product to bring the power of GPUs to SAS programmers who are developing a custom algorithm.
A new hope: deep learning in causal inference
Udo: On a very high level, what issue does PROC DEEPCAUSAL help solve?
Gunce: Whenever you ask questions about why or what-if, you seek answers about cause and effect. PROC DEEPCAUSAL in SAS Econometrics helps estimate the effect of a treatment or a policy variable on an outcome variable by estimating 11 types of causal effects, or treatment effect parameters. Beyond estimating these parameters, PROC DEEPCAUSAL also performs policy evaluation and policy comparison for choosing the optimal policy.
Udo: How did the SAS Deep Learning, SAS Econometrics, and SAS/STAT teams work together for this?
Gunce: The STAT team has a wide range of tools and expertise in causal analysis. We shared quite a bit of knowledge with them while developing PROC DEEPCAUSAL, and this collaboration continues. The technique that PROC DEEPCAUSAL consists of has two stages. In the first stage, appropriate deep neural networks (DNNs) are constructed to estimate the parameters of interest for causal inference. In the second stage, the estimates from the previous step are used, and new DNNs are constructed for policy optimization. We collaborated with the Deep Learning team to construct a PoC for this project, which calls the deepLearn action set from the Deep Learning team. Thanks to this collaboration, we could release PROC DEEPCAUSAL in a relatively short time.
Udo: How is this a resilient and scalable solution?
Gunce: PROC DEEPCAUSAL uses DNNs to estimate policy analysis. DNNs overcome several technical difficulties in the big data era. This includes the massive amount of discrete or continuous potential covariates and the unknown nonlinear relationships among the covariates, the outcome, and the treatment assignment. A problem in applying DNN to causal inference is interpretability. To solve this problem, PROC DEEPCAUSAL applies the DNNs via a two-step semiparametric framework and gives inferential results for the parameters of interest through the corresponding influence functions.
Best of both worlds: a forecasting story
Udo: I hear you worked on an open source feature for external languages with the SAS Visual Forecasting team. Can you tell me more about it?
Javier: Sure, this feature enables you to integrate your existing Python and/or R software into your SAS Visual Forecasting workflows. Our customers and/or audience would often ask us, after presenting our SAS Visual Forecasting framework features and scalability characteristics, whether they could leverage this framework to run their own models written in Python and R. We’re happy to say that the EXTLANG package of SAS Visual Forecasting makes this possible. It also allows you to compare open source forecasting models to our own built-in models.
Udo: I understand that there was some cross-department collaboration – how did this come about?
Javier: One concern that running general-purpose programming languages like Python and R presents is the fact that it allows malicious users to run programs that they would be unable to run in a tightly controlled language like SAS. As a result, we worked early on with our security specialists and with the excellent developers on our SAS CAS team to ensure that we make it possible for you to run Python and R in the safest way possible. This includes ensuring that this feature is off by default and allowing CAS administrators to control who can run EXTLANG programs, which programs to be run, and more.
Udo: What is important about this innovation?
Javier: A common use case for running Python and R programs through SAS Visual Forecasting is to leverage the fact that programs executed via SAS Visual Forecasting are automatically split into jobs according to “BY” variables that you specify. The BY variables delineate the time series, which then run independently on all available processors that are configured in CAS. The fact that there is no inter-process communication, combined with an optimized data shuffling strategy, allows us to achieve near-linear scalability with workloads consisting of tens and even hundreds of thousands of time series.