Machine Learning models are becoming widely used to formulate and describe processes’ key metrics across different industry fields.  There is also an increasing need for the integration of these Machine Learning (ML) models with other Advanced Analytics methodologies, such as Optimization. Specifically, in the manufacturing industry, SAS explored state-of-the-art science boundaries by introducing ML models (which explain quality as well as yielding metrics using manufacturing settings as regressors) within an Optimization model. The model then attempted to choose the optimum set points to maximize the yield while satisfying the quality requirements.

To illustrate with a simplified example, I will describe a couple relevant metrics and settings in an automotive airbag production. The process owners need to decide the amount of sodium azide and oxidizer to use as propellant (among many other manufacturing settings) in order to satisfy the required ability to produce an amount of gas at a given rate, ensuring proper airbag inflation. Quality metrics (such as airbag inflation) will typically have an associated tolerance, allowing for a generation of an upper and lower bound. The goal is to find the right combination of manufacturing settings (sodium azide and oxidizer) to minimize costs (or maximize yield), while keeping the key quality metric (gas production rate) within required bounds.

To address this problem, we need to understand how the settings affect the key metric, which traditionally has been explained with linear regression models, partially due to their natural fit within linear optimization formulations. Now the industry is exploring to explain these relationships with more sophisticated models (attempting to increase model accuracy) such as Neural Nets, or Grad Boost models, which in turn require pushing some boundaries in optimization formulations and solution methodologies.

## Mathematical challenges

Incorporating non-closed-form and nonlinear models (such as Neural Nets or Grad Boost) in optimization does not allow for traditional sound-and-proof algorithms to work (such as branch-and-bound or simplex). Fortunately, SAS offers the capability to solve this nonlinear optimization model with cutting-edge solvers such as black-box, while still using the OR practitioner-beloved modeling language OPTMODEL.

## SAS technology used

In this post, I will walk you through the right coding syntax to formulate and solve a nonlinear optimization problem, where constraint and objective function equations are non-closed-form Machine Learning models. The following SAS functionalities will be used:

Please note I will not be discussing optimization convergence or ML model accuracy in this blog. Instead, I will keep it focused on the code syntax to help SAS users explore the incorporation of ML models as constraints or objectives in an optimization formulation.

## Mathematical formulation

To illustrate the syntax, I have set up an oversimplified example with two products, four manufacturing settings (one of those being a binary setting, making this a Mixed Integer Nonlinear Optimization problem), one quality metric (kpi) that has a lower bound and an overall yield that needs to be maximized. Both the kpi and the yield are explained with the manufacturing settings as regressors through Gradboost models.

Decision Variables:

$\textrm{Setting}_j$: Value of the manufacturing setting $\mathit{j}$, where $j\in\{1,\dots,4\}$

Constraints:

$f_i(\textrm{Setting}_1, ... ,\textrm{Setting}_4)\geq 100$ for $i \in \{1,2\}$

Constraint for each product $i$ that sets a lower bound of $100$ for $f()$, where $f()$ is the non-closed-form ML model depending on the value of $\textrm{Setting}_1,\dots,\textrm{Setting}_4$

Objective Function:

maximize $g(\textrm{Setting}_1,\dots,\textrm{Setting}_4)$

Maximizes the non-closed-form ML model $g()$ that depends on the value of $\textrm{Setting}_1,\dots,\textrm{Setting}_4$

### Code syntax:

Set up a CAS session:

proc options option=(CASHOST CASPORT); run; cas mysess; libname mycas cas sessref=mysess;

Generate mock data for demonstration purposes:

data mycas.prod_history; input item yield setting1-setting4 kpi; datalines; 101 100 1 4 3.5 6 140 101 180 0 2.3 6 10 120 102 69 1 1.3 5 12 163 102 79 1 1.6 10 23 203 ;

Generate two Gradboosting models to predict yield and KPI based on four settings. Please notice setting 1 is a nominal variable.

proc gradboost data=mycas.prod_history; input setting2 setting3 setting4 / level=interval; input setting1 item / level=nominal; target yield / level=interval; savestate rstore=mycas.stored_gb_yield; run;   proc gradboost data=mycas.prod_history; input setting2 setting3 setting4 / level=interval; input setting1 item / level=nominal; target kpi / level=interval; savestate rstore=mycas.stored_gb_kpi; run;

Save the astores (analytical stores for yield) locally:

proc astore; download rstore=mycas.stored_gb_yield store="/r/sanyo.unx.sas.com/vol/vol920/u92/navikt/casuser/gp/stored_gb_yield"; download rstore=mycas.stored_gb_kpi store="/r/sanyo.unx.sas.com/vol/vol920/u92/navikt/casuser/gp/stored_gb_kpi"; quit;

Create user-defined functions, calling the analytical store defined above:

proc fcmp outlib=work.score.funcs; function astore_yield(item, setting1, setting2, setting3, setting4); declare object myscore(astore); call myscore.score("/r/sanyo.unx.sas.com/vol/vol920/u92/navikt/casuser/gp/stored_gb_yield"); return(P_yield); endsub;   function astore_kpi(item, setting1, setting2, setting3, setting4); declare object myscore(astore); call myscore.score("/r/sanyo.unx.sas.com/vol/vol920/u92/navikt/casuser/gp/stored_gb_kpi"); return(P_kpi); endsub; run; quit;

Point to the previously stored compiled functions:

options cmplib=work.score;

Define the decision variables and the implicit variables in OPTMODEL. The implicit variables Kpi and Yield, which typically are defined with closed-form equations, will now call the user-defined functions that include the analytical stores for the gradboost models.

proc optmodel; set ITEMS = {101,102}; var Setting1 binary; var Setting2 >= 0; var Setting3 >= 0; var Setting4 >= 0;   impvar Kpi{i in ITEMS} = astore_kpi(i, Setting1, Setting2, Setting3, Setting4); impvar Yield{i in ITEMS} = astore_yield(i, Setting1, Setting2, Setting3, Setting4);

Define the constraints:

 con KPI_1_UB{i in ITEMS}: Kpi[i] >= 100;

Define the objective:

 max TotalYield = sum{i in ITEMS} Yield[i];

Call the black-box solver:

 solve with blackbox;

Create output data:

 create data mycas.prod_out from setting1 setting2 setting3 setting4 TotalYield;   create data mycas.kpi_out from [ITEMS] Kpi; quit;

Using this syntax, we are able to obtain optimum values for our four settings maximizing yield:

 Setting1 Setting2 Setting3 Setting4 TotalYield 0 92.43697479 88.235294118 82.352941176 228.3719873

while satisfying the requirement to keep the quality kpi above 100 for each product:

 ITEMS Kpi 101 156.35430437 102 156.35430437

## Final remarks about Machine Learning models

There is an increasing need to incorporate non-closed form models within optimization formulations. SAS provides an easy and intuitive way to incorporate state-of-the-art technology such as Machine Learning models and black-box optimization solvers with the syntax described above. Enjoy!

For additional information regarding Operations Research, be sure to visit our SAS Community and other Operations Research blog posts.

Share