The post Fun with Flags: Optimizing Arrangements of Stars with SAS appeared first on Operations Research with SAS.

]]>Suppose we want to arrange \(n\) stars in a rectangular region. Let \(x_i\) and \(y_i\) denote the coordinates of star \(i\), for \(i=1,\dots,n\). We need some way to measure the quality of a particular arrangement. What stands out to me in the current flag is that the stars are spread out in the sense that no two stars are too close to each other. One way to capture this idea with optimization is by using a maximin model that maximizes the minimum Euclidean distance between stars. Explicitly, we want to maximize

\(\min\limits_{i,j} \sqrt{(x_i-x_j)^2+(y_i-y_j)^2}\)

subject to lower and upper bounds on \(x_i\) and \(y_i\).

The following SAS macro variables record the number of stars and the width and height of the rectangular region as proportions of the overall flag height, according to the United States code:

%let n = 50; %let width = 0.76; %let height = 0.5385; |

The following PROC OPTMODEL statements capture the mathematical optimization problem just formulated:

proc optmodel; /* declare parameters */ num n = &n; set POINTS = 1..n; num width = &width; num height = &height; /* declare variables, objective, and constraints */ var X {POINTS} >= 0 <= width; var Y {POINTS} >= 0 <= height; var Maxmin >= 0; max Objective = Maxmin; con MaxminCon {i in POINTS, j in POINTS: i < j}: Maxmin <= (X[i]-X[j])^2 + (Y[i]-Y[j])^2; /* call nonlinear programming solver with multistart option */ solve with nlp / multistart; /* save optimal solution to SAS data set for use with PROC SGPLOT */ create data solution from [i] X Y; quit; |

Note that we use squared distance instead of distance in order to avoid the non-differentiability of the square root function at 0. Also, the MULTISTART option of the nonlinear programming solver increases the likelihood of finding a globally optimal solution for this nonconvex optimization problem with many locally optimally solutions.

The following PROC SGPLOT statements plot the resulting solution:

proc sgplot data=solution aspect=%sysevalf(10/19); styleattrs wallcolor='#3C3B6E'; scatter x=X y=Y / markerattrs=(symbol='StarFilled' color=white size=20); xaxis values=(0 &width) display=none; yaxis values=(0 &height) display=none; run; |

Here is the resulting plot, which is a transpose of the existing design:

The resulting objective value is 0.0116, which corresponds to the squared distance between vertically adjacent pairs of stars.

For comparison, here is the plot of the existing design (objective value 0.0103), where instead the diagonally adjacent stars are the closest:

For each new state admitted to the union, a star is added and the new flag is introduced the next July 4th. Recent proposals to add either Puerto Rico or Washington, D.C. raise the prospect of a 51-star flag. Several researchers have looked for such arrangements among a handful of prespecified patterns. See, for example, this 2012 article in *The American Mathematical Monthly*. Instead, we have a general optimization model for \(n\) stars, so we can change the value of \(n\) to 51 and rerun the code. Here is the resulting plot:

How do you like it in comparison with this popular design that has alternating rows of nine and eight stars?

The U.S. flag adopted on June 14, 1777 had 13 stars. Here's the optimized version:

The Star Spangled Banner flag that inspired Francis Scott Key to write the U.S. national anthem had 15 stars. Here's the optimized version:

Although the problem discussed here was motivated by flag design, the same problem applies to facility location. Suppose you want to build a number of facilities within some prescribed region. If two facilities are too close to each other, they will cannibalize each other's business. So it makes sense to maximize the minimum distance between facilities.

For those in the United States, enjoy the holiday tomorrow!

The post Fun with Flags: Optimizing Arrangements of Stars with SAS appeared first on Operations Research with SAS.

]]>The post Creating Synthetic Data with SAS/OR appeared first on Operations Research with SAS.

]]>Researchers could use synthetic data to, for example, understand the format of the real data, develop an understanding of its statistical properties, build preliminary models, or tune algorithm parameters. Analytical procedures and code developed using the synthetic data can then be passed to the data owner if the collaborators are not permitted to access the original data. Even if the collaborators are eventually granted access, synthetic data allow the work to be conducted in parallel with tasks required to access the real data, allowing an efficient path to the final analysis.

One method for producing synthetic data with SAS was presented at SAS Global Forum 2017 with this corresponding paper (Bogle and Erickson 2017), which implemented an algorithm originally proposed in another paper (Bogle and Mehrotra 2016, subscription required). The implementation uses the OPTMODEL procedure and solvers in SAS/OR to create synthetic data with statistical raw moments similar to the real data.

Most SAS users are familiar with mean, variance, and covariance. The mean is the first-order raw moment, \(\mathbf{E}[X]\). The variance is the second-order marginal central moment, \(\mathbf{E}[(X-\mathbf{E}[X])^2]\), and covariance is the second-order mixed central moment, \(\mathbf{E}[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])]\). Skewness and kurtosis are the third- and fourth-order normalized moments.

This method considers only raw moments, which keeps the optimization problem linear. The more of these moments that are matched, the more like the original data it will be. It is difficult to match the moments exactly, so the algorithm tries to keep them within desired bounds. For example, one of the fourth-order moments (meaning the sum of the exponents is four) could be \(\mathbf{E}[X^2YZ]\), which could have a value of 0.5 in the real data set. The method might try to "match" this moment by constraining that moment in the new data set to be between 0.48 and 0.52. If \(N\) observations are desired, this would ideally mean that the moment value would be constrained between those bounds, \(\displaystyle{0.48 \le \frac{1}{N} \sum_{i=1}^N x_i^2y_iz_i \le 0.52}\).

There are a couple of problems with solving a system of inequalities like this. First, it is not linear or convex, so it would be very hard to solve, especially when some variables need to take integer values. This macro instead creates candidate observations and uses mathematical optimization to select which observations to include. Second, there might not be a solution that solves such a system of inequalities, so the method allows violations to the bounds on the moments and minimizes them.

There are three basic steps to the algorithm. The first step reads in the original data, computes the upper and lower bounds on the moments, and sets up other parameters for the optimization models. The second step uses column generation with a linear program to produce candidate synthetic observations. The third step uses an integer program to select the desired number of synthetic observations from the candidate synthetic observations, while minimizing the largest scaled violation of the moment bounds. The following code is the macro that generates synthetic data by calling a different macro for each of the three steps.

%macro GENDATA(INPUTDATA, METADATA, OUTPUTDATA=SyntheticData, MOMENTORDER=3, NUMOBS=0, MINNUMIPCANDS=0, LPBATCHSIZE=10, LPGAP=1E-3, NUMCOFORTHREADS=1, MILPMAXTIME=600, RELOBJGAP=1E-4, ALPHA=0.95, RANDSEED=0); proc optmodel printlevel=0; call streaminit(&RANDSEED); %PRELIMINARYSTEP(INPUTDATA=&INPUTDATA, METADATA=&METADATA, MOMENTORDER=&MOMENTORDER, ALPHA=&ALPHA); %LPSTEP(MOMENTORDER=&MOMENTORDER, NUMOBS=&NUMOBS, MINNUMIPCANDS=&MINNUMIPCANDS, LPBATCHSIZE=&LPBATCHSIZE, LPGAP=&LPGAP, NUMCOFORTHREADS=&NUMCOFORTHREADS); %IPSTEP(OUTPUTDATA=&OUTPUTDATA, MOMENTORDER=&MOMENTORDER, NUMOBS=&NUMOBS, MINNUMIPCANDS=&MINNUMIPCANDS, MILPMAXTIME=&MILPMAXTIME, RELOBJGAP=&RELOBJGAP); quit; %mend GENDATA; |

You can see that there are several parameters, which are described in the paper. Most of them control the balance of the quality of the synthetic data with the time needed to produce the synthetic data.

The macro demonstrates some useful features of PROC OPTMODEL. One option for the macro is to use PROC OPTMODEL's COFOR loop. This allows multiple LPs to be solved concurrently, which can speed up the LP step. The implementation breaks the assignment of creating candidate observations into NUMCOFORTHREADS groups, and one solve statement from each group can be run concurrently. The use of COFOR reduces the total time from five hours to three hours in the example below.

The code also calls SAS functions not specific to PROC OPTMODEL, including RAND, STD, and PROBIT. The RAND function is used to create random potential observations during the column generation process. The STD and PROBIT functions are used to determine the desired moment range based on the input data and a user-specified parameter.

The examples in the paper are based on the *Sashelp.Heart* data set. The data set is modified to fit the requirements that all variables must be numeric and there can be no missing data.

The example randomly divides the data into Test and Training data sets so that a model based on the Training data set can be tested against a model based on a synthetic data set derived from the Training data set.

The following call of the macro generates the synthetic data based on the Training data set with the appropriate METADATA data set. It allows up to four LPs to be solved concurrently, matches moments up to the third order, allows 20 minutes for the MILP solver, and uses a fairly small range for the moment bounds.

%GENDATA(INPUTDATA=Training, METADATA=Metadata, NUMCOFORTHREADS=4, MOMENTORDER=3, MILPMAXTIME=1200, ALPHA=0.2); |

One way to verify that the synthetic data are similar to the real data is to compare the means, standard deviations, and covariances of the input and synthetic data sets. The paper shows how to do this using PROC MEANS and PROC CORR on a smaller synthetic data set.

In this case we want to compare the training and synthetic data by comparing how well the models they fit score the Test data set. One of the variables indicates if a person is dead, so we can use a logistic regression to predict if a person is dead or alive. Because BMI is a better health predictor than height or weight, we have added a BMI column to the data sets. The following code calls PROC LOGISTIC for the two data sets.

proc logistic data=TrainingWithBmi; model Dead = Male AgeAtStart Diastolic Systolic Smoking Cholesterol BMI / outroc=troc; score data=TestingWithBmi out=valpred outroc=vroc; run; proc logistic data=SyntheticDataWithBmi; model Dead = Male AgeAtStart Diastolic Systolic Smoking Cholesterol BMI / outroc=troc; score data=TestingWithBmi out=valpred outroc=vroc; run; |

This produces quite a bit of output. One useful comparison is the area under the curve (AUC) of the ROC curve, which plots a comparison of sensitivity and specificity of the model. The first plot is for the original training data scored with the testing data from the original data set. The second plot is for the synthetic data scored with the testing data from the original data set. You can see that the curves are fairly similar and there is little difference between the AUCs.

This synthetic data set behaves similarly to the original data set and can be safely shared without risk of exposing personally identifiable information.

What uses do you have for synthetic data?

The post Creating Synthetic Data with SAS/OR appeared first on Operations Research with SAS.

]]>The post Operations Research Talks at SAS Global Forum 2017 appeared first on Operations Research with SAS.

]]>**Monday, April 3**

- 11:00 AM - 11:30 AM, Dolphin Level 3 - Oceanic 1 (Session 0681) "Using SAS/OR® Software to Optimize the Capacity Expansion Plan of a Robust Oil Products Distribution Network," Dr. Shahrzad Azizzadeh, SAS
- 1:00 PM - 1:30 PM, Dolphin Level 3 - Oceanic 1 (Session 1387) "Increasing Revenue in Only Four Months with SAS® Real-Time Decision Manager," Alvaro Velasquez, Telefonica
- 2:30 PM - 3:00 PM, Dolphin Level 1 - The Quad - Lower - Super Demo Station 9 (Super Demo SD921) "Optimization and SAS® Viya™," Manoj Chari, SAS
- 3:30 PM - 4:00 PM, Dolphin Level 5 - The Quad - Upper - Theater 2 (Session 1055) "Using the CLP Procedure to Solve the Agent-District Assignment Problem," Stephen Sloan, Kevin Gillette, Accenture
- 4:00 PM - 4:30 PM, Dolphin Level 3 - Oceanic 1 (Session 0514) "Automated Hyperparameter Tuning for Effective Machine Learning," Patrick Koch, Brett Wujek, Oleg Golovidov, Steven Gardner, SAS

**Tuesday, April 4**

- 11:00 AM - 11:30 AM, Dolphin Level 3 - Asia 4 (Session 0454) "Change Management: Best Practices for Implementing SAS® Prescriptive Analytics," Scott Shuler, SAS
- 3:30 PM - 4:30 PM, Dolphin Level 3 - Oceanic 3 (Session 0302) "Optimize My Stock Portfolio! A Case Study with Three Different Estimates of Risk," Aric LaBarr, NC State University
- 3:30 PM - 4:30 PM, Dolphin Level 3 - Oceanic 1 (Session 0851) "Optimizing Delivery Routes with SAS® Software," Ben Murphy, Zencos, and Bruce Bedford, Oberweis Dairy Inc.
- 4:00 PM - 4:30 PM, Dolphin Level 1 - The Quad - Lower - Super Demo Station 9 (Super Demo SD909) "New Features in SAS® Simulation Studio," Edward P. Hughes, SAS
- 4:00 PM - 4:30 PM, Dolphin Level 5 - Southern Hemisphere IV (Session 2016) "Optimization and SAS® Viya™," Manoj Chari, SAS

**Wednesday, April 5**

- 11:00 AM - 11:30 AM, Dolphin Level 3 - Oceanic 2 (Session 1224) "A Moment-Matching Approach for Generating Synthetic Data in SAS®," Brittany M. Bogle, The University of North Carolina at Chapel Hill, and Jared C. Erickson, SAS
- 11:30 AM - 12:00 PM, Dolphin Level 3 - Oceanic 1 (Session 0593) "Key Components and Finished Products Inventory Optimization for a Multi-Echelon Assembly System," Sherry Xu, Kansun Xia, Ruonan Qiu, SAS
- 1:00 PM - 1:30 PM, Dolphin Level 3 - Asia 1 (Session 1326) "Price Recommendation Engine for Airbnb," Praneeth Guggilla, Singdha Gutha, and Goutam Chakraborty, Oklahoma State University

That's quite a list! It's great to see so much operations research work being done with SAS, and it's even more encouraging to see so many people talking about their successes with these methods.

The post Operations Research Talks at SAS Global Forum 2017 appeared first on Operations Research with SAS.

]]>The post SAS at the 2017 INFORMS Conference on Business Analytics and Operations Research appeared first on Operations Research with SAS.

]]>- Want SAS Skills? Get SAS University Edition, a Powerful and Free Analytical Tool from SAS, James Harroun, 9:00am-10:45am, Trevi
- Data Discovery and Analysis with JMP 13 Pro, Mia Stephens, 11:00am-12:45pm, Verona
- Solving Business Problems with SAS Analytics and OPTMODEL, Rob Pratt and Golbarg Tutunchi, 3:00pm-4:45pm, Turin

- Analyzing Unstructured Text Data with the JMP 13 Text Explorer, Mia Stephens, 9:10am-10:00am, Octavius 23
- Building and Solving Optimization Models with SAS, Rob Pratt, 10:30am-11:20am, Octavius 23
- Strategic Scheduling Intelligence at JPMorgan Chase’s Card Production Center, Jay Carini (JPMorgan Chase), 3:00pm–3:40pm, Octavius Ballroom
- A Deep Learning Tutorial, Mustafa Kabul, 3:50pm-4:40pm, Octavius 5/6

- A Novel SVDD Based Control Chart for Monitoring Multivariate Sensor Data, Anya McGuirk, 1:50pm-2:40pm, Octavius 7/8
- SAS Visual Statistics, Tom Bohannon, 1:50pm-2:40pm, Octavius 21
- Optimizing Rates for Service Agreements by Shaping Loss Functions, Maarten Oosten, 4:40pm-5:30pm, Octavius 19

We look forward to seeing you in Vegas!

The post SAS at the 2017 INFORMS Conference on Business Analytics and Operations Research appeared first on Operations Research with SAS.

]]>The post Solving Kakuro Puzzles with SAS/OR appeared first on Operations Research with SAS.

]]>For example, the 16 in the top left corner indicates that the first two white cells in the top row must add up to 16. Since they must also be distinct, the only two possibilities are 7 + 9 and 9 + 7.

The puzzle is of course meant to be solved by hand by making logical deductions. But you can also use mixed integer linear programming (MILP). One natural formulation involves two sets of decision variables:

\(\begin{align*} \mathrm{V}[i,j] &= \text{the digit in cell $(i,j)$}, \\ \mathrm{X}[i,j,k] &= \begin{cases} 1 & \text{if cell $(i,j)$ contains digit $k$},\\ 0 & \text{otherwise} \end{cases} \end{align*}\)

For more details, see Kalvelagen's post.

You can capture the clues with a SAS data set:

%let n = 8; data indata; input (C1-C&n) ($); datalines; x\x 23\x 30\x x\x x\x 27\x 12\x 16\x x\16 x x x\x 17\24 x x x x\17 x x 15\29 x x x x x\35 x x x x x 12\x x\x x\x x\7 x x 7\8 x x 7\x x\x 11\x 10\16 x x x x x x\21 x x x x x\5 x x x\6 x x x x\x x\3 x x ; |

The following PROC OPTMODEL code declares sets and parameters, reads the data, declares the variables and constraints, calls the MILP solver, and outputs the solution to a SAS data set:

proc optmodel; set ROWS; set COLS = 1..&n; str clue {ROWS, COLS}; /* read two-dimensional data */ read data indata into ROWS=[_n_]; read data indata into [i=_n_] {j in COLS} <clue[i,j] = col("c"||j)>; /* create output data set for use by PROC SGPLOT */ create data puzzle from [i j]=(ROWS cross COLS) label=(compress(clue[i,j],'x')) color=(clue[i,j]='x'); /* parse clues */ num num_clues init 0; set CLUES = 1..num_clues; set <num,num> CELLS_c {CLUES} init {}; num clue_length {c in CLUES} = card(CELLS_c[c]); num clue_sum {CLUES}; str prefix, suffix; num curr; for {i in ROWS, j in COLS: clue[i,j] ne 'x'} do; prefix = scan(clue[i,j],1,'\'); suffix = scan(clue[i,j],2,'\'); if prefix ne 'x' then do; num_clues = num_clues + 1; clue_sum[num_clues] = input(prefix, best.); curr = i + 1; do until (curr not in ROWS or clue[curr,j] ne 'x'); CELLS_c[num_clues] = CELLS_c[num_clues] union {<curr,j>}; curr = curr + 1; end; end; if suffix ne 'x' then do; num_clues = num_clues + 1; clue_sum[num_clues] = input(suffix, best.); curr = j + 1; do until (curr not in COLS or clue[i,curr] ne 'x'); CELLS_c[num_clues] = CELLS_c[num_clues] union {<i,curr>}; curr = curr + 1; end; end; end; set CELLS = union {c in CLUES} CELLS_c[c]; set DIGITS = 1..9; /* decision variables */ var X {CELLS, DIGITS} binary; var V {CELLS} integer >= 1 <= 9; /* constraints */ con OneDigitPerCell {<i,j> in CELLS}: sum {k in DIGITS} X[i,j,k] = 1; con AlldiffPerClue {c in CLUES, k in DIGITS}: sum {<i,j> in CELLS_c[c]} X[i,j,k] <= 1; con VCon {<i,j> in CELLS}: sum {k in DIGITS} k * X[i,j,k] = V[i,j]; con SumCon {c in CLUES}: sum {<i,j> in CELLS_c[c]} V[i,j] = clue_sum[c]; /* call MILP solver with no objective */ solve noobj; /* create output data set for use by PROC SGPLOT */ create data solution from [i j]=(ROWS cross COLS) V label=(if <i,j> in CELLS then put(V[i,j],1.) else compress(clue[i,j],'x')) color=(<i,j> in CELLS); quit; |

Notice that the PROC OPTMODEL code is completely data-driven so that you can solve other problem instances just by changing the input data set. In this case, the MILP presolver solves the entire problem instantly:

NOTE: Problem generation will use 4 threads. NOTE: The problem has 360 variables (0 free, 0 fixed). NOTE: The problem has 324 binary and 36 integer variables. NOTE: The problem has 312 linear constraints (216 LE, 96 EQ, 0 GE, 0 range). NOTE: The problem has 1404 linear constraint coefficients. NOTE: The problem has 0 nonlinear constraints (0 LE, 0 EQ, 0 GE, 0 range). NOTE: The OPTMODEL presolver is disabled for linear problems. NOTE: The MILP presolver value AUTOMATIC is applied. NOTE: The MILP presolver removed all variables and constraints. NOTE: Optimal. NOTE: Objective = 0.

Finally, a call to PROC SGPLOT generates a nice plot of the solution:

proc sgplot data=solution noautolegend; heatmapparm x=j y=i colorresponse=color / colormodel=(black white) outline; text x=j y=i text=label / colorresponse=color colormodel=(white black); xaxis display=none; yaxis display=none reverse; run; |

The resulting plot makes it easy to verify the correctness of the solution:

Here is the corresponding DATA step for the large instance in Kalvelagen's post:

%let n = 22; data indata; input (C1-C&n) ($); datalines; x\x 12\x 38\x x\x 23\x 21\x x\x x\x 16\x 29\x 30\x x\x 33\x 30\x 3\x x\x x\x 17\x 35\x x\x 21\x 3\x x\17 x x 11\14 x x x\x 17\23 x x x x\16 x x x 15\x x\8 x x 3\3 x x x\16 x x x x x 3\30 x x x x x\13 x x x x 4\22 x x x x x x\x 23\27 x x x x x x 23\11 x x 17\17 x x x\21 x x x x x x 18\x x\13 x x x 30\45 x x x x x x x x x 30\14 x x x x 20\4 x x x\35 x x x x x 44\x 3\11 x x 10\16 x x 42\10 x x 3\x 22\29 x x x x x\17 x x 24\41 x x x x x x x 14\21 x x x x x x 20\23 x x x x\x 23\x 42\17 x x 30\3 x x x\6 x x x 16\16 x x 35\21 x x x x 40\x 3\x x\38 x x x x x x 16\x x\x 17\x 28\34 x x x x x 13\12 x x 26\4 x x x\24 x x x x\24 x x x 17\35 x x x x x 7\29 x x x x x x x x\8 x x 16\x 30\41 x x x x x x x 3\29 x x x x 29\18 x x x x\x x\x x\34 x x x x x 42\11 x x x 12\11 x x x x 15\30 x x x x 19\x x\x 12\24 x x x 14\17 x x 17\16 x x x x x x\14 x x x x\7 x x x x\16 x x 16\35 x x x x x 15\3 x x 3\x 34\x x\x 28\17 x x 23\x 35\16 x x x\16 x x x x x 4\41 x x x x x x x 6\41 x x x x x x x x\x x\4 x x 24\6 x x x 14\4 x x x\10 x x x x 16\x 29\23 x x x x\x x\x x\x 41\28 x x x x x x x 16\x x\x 29\41 x x x x x x x 43\x x\x x\x 16\23 x x x 21\x 24\29 x x x x 7\17 x x 3\23 x x x 29\16 x x 16\x x\42 x x x x x x x x\34 x x x x x x x 44\35 x x x x x x\15 x x 17\x x\8 x x 17\x x\x 29\x 3\3 x x 11\24 x x x x x 17\12 x x x\10 x x x 34\24 x x x 4\16 x x x x x 17\4 x x 30\23 x x x x\x x\x x\29 x x x x 3\10 x x x x 30\12 x x x 16\33 x x x x x 6\x x\x 17\11 x x x 30\11 x x x x 12\42 x x x x x x x x\x 13\4 x x x\29 x x x x x x x 11\35 x x x x x x\23 x x x 11\7 x x x x\16 x x 7\16 x x 14\16 x x x x x 17\x 29\x x\x 16\38 x x x x x x x\x 9\x 22\27 x x x x 15\6 x x 32\23 x x x 23\12 x x 21\4 x x 35\x 6\x x\6 x x x 27\22 x x x x x x 14\42 x x x x x x x 16\3 x x x\23 x x x x 15\x 4\4 x x 30\11 x x 29\9 x x 23\x 3\16 x x x x x x\4 x x 14\11 x x x x x\45 x x x x x x x x x 20\20 x x x x\x 17\21 x x x x x x 16\16 x x x\17 x x 16\24 x x x x x x 4\x x\34 x x x x x x\29 x x x x x\29 x x x x x\30 x x x x x x\13 x x x\4 x x x\x x\23 x x x x\16 x x x x\x x\4 x x x\8 x x ; |

And here is the plot of the input:

The same PROC OPTMODEL and PROC SGPLOT statements shown earlier then yield the following output:

NOTE: Problem generation will use 4 threads. NOTE: The problem has 4920 variables (0 free, 0 fixed). NOTE: The problem has 4428 binary and 492 integer variables. NOTE: The problem has 3664 linear constraints (2412 LE, 1252 EQ, 0 GE, 0 range). NOTE: The problem has 19188 linear constraint coefficients. NOTE: The problem has 0 nonlinear constraints (0 LE, 0 EQ, 0 GE, 0 range). NOTE: The OPTMODEL presolver is disabled for linear problems. NOTE: The MILP presolver value AUTOMATIC is applied. NOTE: The MILP presolver removed 3077 variables and 2087 constraints. NOTE: The MILP presolver removed 12000 constraint coefficients. NOTE: The MILP presolver added 22 constraint coefficients. NOTE: The MILP presolver modified 11 constraint coefficients. NOTE: The presolved problem has 1843 variables, 1577 constraints, and 7188 constraint coefficients. NOTE: The MILP solver is called. NOTE: The parallel Branch and Cut algorithm is used. NOTE: The Branch and Cut algorithm is using up to 4 threads. Node Active Sols BestInteger BestBound Gap Time 0 1 0 . 0 . 2 NOTE: The MILP presolver is applied again. 0 0 1 0 0 0.00% 2 NOTE: Optimal. NOTE: Objective = 0.

The requirement that the digits within a single clue must all be different suggests the use of the ALLDIFF constraint supported in the constraint programming solver. You can then omit the binary X[i,j,k] variables and the corresponding constraints. The resulting model is much simpler, with only one set of variables and two sets of constraints:

/* decision variables */ var V {CELLS} integer >= 1 <= 9; /* constraints */ con AlldiffCon {c in CLUES}: alldiff({<i,j> in CELLS_c[c]} V[i,j]); con SumCon {c in CLUES}: sum {<i,j> in CELLS_c[c]} V[i,j] = clue_sum[c]; /* call constraint programming solver */ solve; |

For the small instance, the constraint programming solver instantly returns the solution:

NOTE: Problem generation will use 4 threads. NOTE: The problem has 36 variables (0 free, 0 fixed). NOTE: The problem has 0 binary and 36 integer variables. NOTE: The problem has 24 linear constraints (0 LE, 24 EQ, 0 GE, 0 range). NOTE: The problem has 72 linear constraint coefficients. NOTE: The problem has 0 nonlinear constraints (0 LE, 0 EQ, 0 GE, 0 range). NOTE: The problem has 24 predicate constraints. NOTE: The OPTMODEL presolver is disabled for problems with predicates. NOTE: Required number of solutions found (1). NOTE: The data set WORK.SOLUTION has 64 observations and 5 variables. NOTE: PROCEDURE OPTMODEL used (Total process time): real time 0.10 seconds cpu time 0.06 seconds

But the large instance is more challenging and does not solve in a reasonable time, for reasons described in a paper by Helmut Simonis. One way to overcome this difficulty is to first do some precomputation, again with the constraint programming solver, to prune the domains of the decision variables. The idea is to consider each clue one at a time, using the FINDALLSOLNS option to find all solutions. Any digit that does not appear among these solutions can then be removed, by using a linear disequality (or "not equal") constraint, from the domains of the V[i,j] variables that are associated with that clue. Because these problems are independent across clues, you can use a COFOR loop to solve them concurrently. The full code is as follows:

proc optmodel; set ROWS; set COLS = 1..&n; str clue {ROWS, COLS}; /* read two-dimensional data */ read data indata into ROWS=[_n_]; read data indata into [i=_n_] {j in COLS} <clue[i,j] = col("c"||j)>; /* create output data set for use by PROC SGPLOT */ create data puzzle from [i j]=(ROWS cross COLS) label=(compress(clue[i,j],'x')) color=(clue[i,j]='x'); /* parse clues */ num num_clues init 0; set CLUES = 1..num_clues; set <num,num> CELLS_c {CLUES} init {}; num clue_length {c in CLUES} = card(CELLS_c[c]); num clue_sum {CLUES}; str prefix, suffix; num curr; for {i in ROWS, j in COLS: clue[i,j] ne 'x'} do; prefix = scan(clue[i,j],1,'\'); suffix = scan(clue[i,j],2,'\'); if prefix ne 'x' then do; num_clues = num_clues + 1; clue_sum[num_clues] = input(prefix, best.); curr = i + 1; do until (curr not in ROWS or clue[curr,j] ne 'x'); CELLS_c[num_clues] = CELLS_c[num_clues] union {<curr,j>}; curr = curr + 1; end; end; if suffix ne 'x' then do; num_clues = num_clues + 1; clue_sum[num_clues] = input(suffix, best.); curr = j + 1; do until (curr not in COLS or clue[i,curr] ne 'x'); CELLS_c[num_clues] = CELLS_c[num_clues] union {<i,curr>}; curr = curr + 1; end; end; end; set CELLS = union {c in CLUES} CELLS_c[c]; set DIGITS = 1..9; /* decision variables */ var V {CELLS} integer >= 1 <= 9; /* constraints */ con AlldiffCon {c in CLUES}: alldiff({<i,j> in CELLS_c[c]} V[i,j]); con SumCon {c in CLUES}: sum {<i,j> in CELLS_c[c]} V[i,j] = clue_sum[c]; problem Original include V AlldiffCon SumCon; /* prune the domains one clue at a time */ set DOMAIN {CELLS}; set LENGTHS_SUMS = setof {c in CLUES} <clue_length[c], clue_sum[c]>; set DOMAIN_ls {LENGTHS_SUMS}; num clue_length_this, clue_sum_this; var W {1..clue_length_this} integer >= 1 <= 9; con IncreasingWCon {k in 1..clue_length_this-1}: W[k] < W[k+1]; con SumWCon {c in CLUES}: sum {k in 1..clue_length_this} W[k] = clue_sum_this; problem Prune include W IncreasingWCon SumWCon; use problem Prune; reset options printlevel=0; cofor {<l,s> in LENGTHS_SUMS} do; put l= s=; clue_length_this = l; clue_sum_this = s; solve with clp / findallsolns; DOMAIN_ls[l,s] = setof {k in 1..clue_length_this, sol in 1.._NSOL_} W[k].sol[sol]; put DOMAIN_ls[l,s]=; end; reset options printlevel; for {<i,j> in CELLS} DOMAIN[i,j] = DIGITS; for {c in CLUES} do; clue_length_this = clue_length[c]; clue_sum_this = clue_sum[c]; for {<i,j> in CELLS_c[c]} DOMAIN[i,j] = DOMAIN[i,j] inter DOMAIN_ls[clue_length_this, clue_sum_this]; end; for {<i,j> in CELLS} put DOMAIN[i,j]=; /* solve original problem with pruned domains */ use problem Original; con PruneDomain {<i,j> in CELLS, k in DIGITS diff DOMAIN[i,j]}: V[i,j] ne k; solve; /* create output data set for use by PROC SGPLOT */ create data solution from [i j]=(ROWS cross COLS) V label=(if <i,j> in CELLS then put(V[i,j],1.) else compress(clue[i,j],'x')) color=(<i,j> in CELLS); quit; |

The entire PROC OPTMODEL call, including the precomputation step, now takes less than a second for the large instance:

NOTE: Problem generation will use 4 threads. NOTE: The problem has 492 variables (0 free, 0 fixed). NOTE: The problem has 0 binary and 492 integer variables. NOTE: The problem has 2878 linear constraints (0 LE, 268 EQ, 0 GE, 0 LT, 2610 NE, 0 GT, 0 range). NOTE: The problem has 3594 linear constraint coefficients. NOTE: The problem has 0 nonlinear constraints (0 LE, 0 EQ, 0 GE, 0 range). NOTE: The problem has 268 predicate constraints. NOTE: The OPTMODEL presolver is disabled for problems with predicates. NOTE: Required number of solutions found (1). NOTE: The data set WORK.SOLUTION has 704 observations and 5 variables. NOTE: PROCEDURE OPTMODEL used (Total process time): real time 0.53 seconds cpu time 0.96 seconds

It turns out that you can also significantly improve the performance without doing any precomputation, simply by using a different variable selection strategy:

solve with clp / varselect=fifo; |

NOTE: Problem generation will use 4 threads. NOTE: The problem has 492 variables (0 free, 0 fixed). NOTE: The problem has 0 binary and 492 integer variables. NOTE: The problem has 268 linear constraints (0 LE, 268 EQ, 0 GE, 0 range). NOTE: The problem has 984 linear constraint coefficients. NOTE: The problem has 0 nonlinear constraints (0 LE, 0 EQ, 0 GE, 0 range). NOTE: The problem has 268 predicate constraints. NOTE: Required number of solutions found (1). NOTE: The data set WORK.SOLUTION has 704 observations and 5 variables. NOTE: PROCEDURE OPTMODEL used (Total process time): real time 6.55 seconds cpu time 6.42 seconds

As in similar logic puzzles, the solution is supposed to be unique. To check uniqueness with the MILP formulation, you can add one more constraint to exclude the first solution and call the MILP solver again, as described by Kalvelagen:

/* check uniqueness */ set <num,num,num> SUPPORT; SUPPORT = {<i,j> in CELLS, k in DIGITS: X[i,j,k].sol > 0.5}; con ExcludeSolution: sum {<i,j,k> in SUPPORT} X[i,j,k] <= card(SUPPORT) - 1; solve noobj; if _solution_status_ ne 'INFEASIBLE' then put 'Multiple solutions found!'; |

For the constraint programming formulation, checking uniqueness is even simpler. Just use the FINDALLSOLNS option:

/* check uniqueness */ solve with clp / findallsolns; if _NSOL_ > 1 then put 'Multiple solutions found!'; |

The post Solving Kakuro Puzzles with SAS/OR appeared first on Operations Research with SAS.

]]>The post How good is the MILP solver in SAS/OR? appeared first on Operations Research with SAS.

]]>Based on this test set Hans Mittelmann is compiling a regular comparison of solvers using different subsets of the MIPLIB2010 set. These are meant to measure different aspects of the solution process. At the recent INFORMS Annual Meeting in Nashville we presented detailed results for all of these categories for the first time. Here we reprint these results for future reference, along with some commentary.

We have 96 nodes each with 128GB of RAM and 2 Intel(R) Xeon(R) E5-2630 v3 CPUs with 8 cores each running at 2.40GHz. We run each benchmark set with the same options as Hans Mittelmann does. Note that our hardware is quite a bit slower than what he is using.

We are using the latest release of SAS as of November 22, 2016: SAS 9.4M4 with SAS/OR 14.2. The operating system is Redhat Enterprise Linux 6.5.

The time averages are 10-second shifted geometric means over all instances. Timeouts and failures are counted at the time limit. Deterministic parallel mode was used for the threaded MILP tests.

This set has been the backbone of all MILP benchmarking efforts for the last 6 years. Many of the instances require special techniques to solve them.

Solved: **81 out of 87**

Average time (s): **240s**

Solved: **85 out of 87**

Average time (s): **121s**

Solved: **82 out of 87**

Average time (s): **95s**

Since this is a very small set it is easy to overtune the solvers to perform well. To counter this we ran our solver with 16 other random seeds and took the averages. They are:

Solved: **82 out of 87**

Average time (s): **235s**

Solved: **85 out of 87**

Average time (s): **113s**

Solved: **84 out of 87**

Average time (s): **84s**

This shows that our solver isn't overtuned since the default run is actually quite a bit unlucky, especially in the 12 thread run.

These instances are solvable by a commercial MILP solver in an hour on a standard machine.

Solved: **170 out of 214**

Average time (s): **261s**

These instances are solved only to find the first feasible solution. The quality of the solution isn't measured.

Solved: **27 out of 33**

Average time (s): **114s**

All of these instances are infeasible. The task is to prove this infeasibility in the shortest time possible.

Solved: **18 out of 19**. Only the huge and LP infeasible zib02 instance was not solved.

Average time (s): **47s**

These instances are typically very hard and have some special feature: numeric instability, size, large trees.

Solved: **24 out of 45**

Average time (s): **1956s**

You can get the slides of our INFORMS 2016 talk here.

Finally, here are all the data in a table:

Test set | Solved | Average time (s) |
---|---|---|

Benchmark (1 thread) | 81 out of 87 | 240 |

Benchmark (1 thread, average of 17 runs) | 82 out of 87 | 235 |

Benchmark (4 threads) | 85 out of 87 | 121 |

Benchmark (4 threads, average of 17 runs) | 85 out of 87 | 113 |

Benchmark (12 threads) | 82 out of 87 | 95 |

Benchmark (12 threads, average of 17 runs) | 84 out of 87 | 84 |

Solvable (12 threads) | 170 out of 214 | 261 |

Feasibility | 27 out of 33 | 114 |

Infeasibility | 18 out of 19 | 47 |

Pathological | 24 out of 45 | 1956 |

If you have some difficult or interesting MILP instances and you want to shape the future of mixed integer optimization research, then please submit your instances to MIPLIB2017. SAS/OR will be participating in that project along with other leading commercial and open source MILP developers.

The post How good is the MILP solver in SAS/OR? appeared first on Operations Research with SAS.

]]>The post SAS at the 2016 INFORMS Annual Meeting appeared first on Operations Research with SAS.

]]>[Location Abbreviations: MCC=Music City Center; OHN=Omni Hotel Nashville]

**Technology Workshops on Saturday, November 12:**

- SAS Global Academic Program: "SAS University Edition," 9:00-11:30am, MCC 201A
- JMP: "JMP 13 Pro," 12:00-2:30pm, MCC 202A
- SAS/OR: "Building and Solving Business Problems with SAS Analytics and OPTMODEL," 3:00-5:30pm, MCC 201A

**Software Tutorials on Tuesday, November 15 (individual presentation time indicated):**

- SAS Global Academic Program: "Analysis of a Presidential Debate Using SAS Text Analytics," Andre de Waal (in track TA94, 8:45-9:30am, MCC 5th Avenue Lobby)
- SAS/OR: "Building and Solving Optimization Models with SAS," Ed Hughes, Rob Pratt (in track TB94, 11:00-11:45am, MCC 5th Avenue Lobby)
- JMP: "Data Analysis and Discovery with JMP 13 Pro," Mia Stephens (in track TB94, 11:45am-12:30pm, MCC 5th Avenue Lobby)

*Other SAS-related talks (session time indicated):*

**Sunday, November 13:**

- "Identifying Shifting Production Bottlenecks Using Clearing Functions," Baris Kacar, Lars Moench (University of Hagen), Reha Uzsoy (North Carolina State University) (in track SA86, 8:00-9:30am, OHN Gibson Board Room)
- "The Use of Simulation for Evaluating Forecast Models," Sanjeewa Naranpanawe (in track SC33, 1:30-3:00pm, MCC 203B)
- "Unlocking Your 80%: Unearthing New Insights with Text Analytics," Christina Engelhardt (in track SD49, 4:30-6:00pm, MCC 211)

**Monday, November 14:**

- "Panel Session: IoT-enabled Data Analytics: Opportunities, Challenges and Applications," including Gul Ege (in track MA68, 8:00-9:30am, OHN Mockingbird 4)

**Tuesday, November 15:**

- "SAS/OR Value Beyond the Model," Leo Lopes (session chair) (in track TA19, 8:00-9:30am, MCC 106B)
- "Estimating Clearing Functions for Production Resources Using Simulation Optimization," Reha Uzsoy (North Carolina State University), Baris Kacar (in track TA67, 8:00-9:30am, OHN Mockingbird 3)
- "Strategies for Maintaining Sparse Dual Solutions in Large-scale Nonlinear SVM," Joshua Griffin, Alireza Yektamaram (in track TB06, 11:00am-12:30pm, MCC 102A)
- "A Hessian Free Method with Warm-starts for Deep Learning Problems," Wenwen Zhou, Joshua Griffin (in track TB06, 11:00am-12:30pm, MCC 102A)
- "An Accelerated Power Method for the Best Rank-1 Approximation to a Matrix," Jun Liu, Ruiwen Zhang, Yan Xu (session chair) (in track TB06, 11:00am-12:30pm, MCC 102A)
- "Local Search Optimization for Hyper-parameter Tuning," Yan Xu (in track TB06, 11:00am-12:30pm, MCC 102A)

**Wednesday, November 16:**

- "The SAS MILP Solver: Current Status and Future Developments," Philipp Christophel (in track WB13, 11:00am-12:30pm, MCC 104C)
- "Single-resource Capacity Control in the Presence of Cancellations, No-shows and Overbooking," Jason Chen (in track WB42, 11:00am-12:30pm, MCC 207D)
- "Pricing and Revenue Management of Function Space in Hotels," Altan Gulcu, Xiaodong Yao (session chair) (in track WB42, 11:00am-12:30pm, MCC 207D)
- "Visual Statistics--SAS," Dursun Delen (Oklahoma State University) (in track WD49, 2:45-4:15pm, MCC 211)
- "
*P*-center and*P*-dispersion Problems: A Bi-criteria Analysis," Golbarg Tutunchi, Yahya Fathi (North Carolina State University) (in track WE82, 4:30-6:00pm, OHN Broadway G)

If you will be in Nashville for the conference, we'd love to see you at our workshops, in the exhibit hall, and at our tutorials. Please consider adding some of these great SAS-related talks to your itinerary!

The post SAS at the 2016 INFORMS Annual Meeting appeared first on Operations Research with SAS.

]]>The post Operations Research Talks at Analytics Experience 2016 appeared first on Operations Research with SAS.

]]>Analytics Experience 2016 will be held on Sept. 12-14, 2016 at the Bellagio in Las Vegas, NV. There will be a great number of excellent talks and demonstrations at the conference, covering many aspects of SAS analytics and many practical applications. Several of these sessions deal directly with the use of operations research (OR) methods. In some cases OR is used to analyze and solve problems directly, and in others to assist and improve other forms of analytics.

There are three 45-minute presentations:

- Dr. Chris DeRienzo, Chief Quality Officer, Mission Health: "Using Simulation in Neonatal Intensive Care to Transform Data into Information" (Session 13283, Monday, Sept. 12, 3:30-4:15pm, Monet 3)
- Rob Pratt, Senior Manager, SAS Advanced Analytics R&D: "Using SAS/OR® to Optimize Scheduling and Routing of Service Vehicles" (Session 13424, Monday, Sept. 12, 4:30-5:15pm, Raphael 3)
- Jinxin Yi, Senior Manager, SAS Advanced Analytics R&D: "SAS/OR® for the Optimization of Media Ad Plans" (Session SAS13569, Wednesday, Sept. 14, 11:15am-12:00pm, Raphael 2)

Also on the conference agenda are one 45-minute Demo Theater presentation and two 15-minute Super Demos. These demonstrations will occur at the indicated locations in the Innovation Hub area.

- Dion Gory, SAS Analytics Consultant: "Exploring and Unraveling Social Media Data: An Analytic Framework" (Session 13359, Monday, Sept. 12, 3:30-4:15pm, Demo Theater 3)
- Brett Wujek, SAS Senior Data Scientist: "Autotuning Machine Learning Models in SAS® Viya™" (Session 13454, Tuesday, Sept. 13, 12:30-12:45pm, Super Demo D)
- Jinxin Yi, Senior Manager, SAS Advanced Analytics R&D: "How Optimization Can Improve Transportation for the Boston Public Schools: A SAS and DataKind DataCorps Project" (Session 14120, Tuesday, Sept. 13, 11:00-11:15am, Super Demo A)

If you're planning to attend the conference, be sure to add at least some of these sessions to your personal agenda. You'll be glad you did! If you can't make it to Analytics Experience 2016 in Las Vegas, consider attending the European edition of the conference, to be held on Nov. 7-9, 2016 at the Marriott Park Hotel in Rome, Italy. In a later post we will preview the operations research talks slated for the Rome conference.

The post Operations Research Talks at Analytics Experience 2016 appeared first on Operations Research with SAS.

]]>The post Dr. Salah Elmaghraby and the Evolution of SAS/OR Resource-Constrained Scheduling appeared first on Operations Research with SAS.

]]>On June 12 of this year, Dr. Elmaghraby passed away. When I was in school, I unquestionably admired Dr. Elmaghraby, not only for his academic accomplishments, but for his friendly demeanor and love of learning. I knew that being able to take a course on activity networks from him was something special. What I have recently come to realize, however, is how relevant his influence still is today, not only on my career, but on SAS in general. To quote Dr. James R. Wilson, my advisor and former department head of the Edward P. Fitts Department of Industrial and Systems Engineering at NCSU: "His contributions to the engineering profession are striking not only for their scope and impact but also for the remarkably long time period over which that impact has been sustained."

As I started asking around, I discovered that numerous people at SAS from a variety of groups knew Dr. Elmaghraby and his work. Most likely his biggest influence on SAS has been on the development of the SAS/OR project management capabilities. The project management procedures were among the first offerings of SAS/OR, dating back to 1982. The CPM procedure was originally developed for project scheduling, but it also has turned out to be well-suited for tackling more general scheduling scenarios because of its rich set of activity and resource-constrained modeling capabilities. Resource-constrained scheduling in particular was an area of interest for Dr. Elmaghraby. Dr. Radhika Kulkarni, the Vice President of Advanced Analytics Research and Development at SAS, reminisces on this topic: "I joined the Operations Research team in 1983 as the second member and was asked to take over a very early version of PROC CPM. Looking around for leading references on the topic of activity networks and resource scheduling, Professor Elmaghraby’s book became my ‘bible’.”

Now, more than thirty years since the first release of SAS/OR, the CPM procedure remains an important cornerstone of SAS’ operations research offerings. In addition to standard features such as tracking, baselines, activity calendars, and multi-project capabilities, a key advantage of the CPM procedure is the extensive resource scheduling algorithms that it includes. These algorithms include support for different resource types such as consumable, replenishable, spanning, and auxiliary (that is, resources required in conjunction with another resource). There is also support for skill pools, substitution of resources, and resource calendars. Dr. Kulkarni continues: “Needless to say, having Dr. Elmaghraby close by at NC State University gave me a wonderful opportunity to interact with him on many occasions. From my very first meeting with him, I was impressed by Professor Elmaghraby’s immense interest in helping anyone who came to him for advice. He always found time in his busy schedule to meet with me or my colleagues whenever I reached out to him.”

The scheduling engine in the CPM procedure is extremely fast (practically linear in time) and as a result, very scalable. To give an idea of the wide variety of problems that have been tackled using CPM, below are summaries of five customer projects, the first two of which use the CPM resource scheduling features influenced by Dr. Elmaghraby:

- The CPM procedure was used by a multinational energy company to schedule the shutdown of 27 platforms and 300 wells of an oil field for maintenance, in addition to coordinating 1000 personnel. The schedule generated by PROC CPM was able to reduce the time needed for maintenance by 22 hours.
- An oil company used the CPM procedure to integrate multiple paving project plans. Previously, the company had hundreds of independent project plans with no ability for them to interact, despite sharing the same resources. The CPM procedure allowed the projects to be integrated to share pooled resources. Activity splitting improved resource utilization, while activity priority allowed regulatory projects to be processed first.
- The CPM procedure was used to reduce the build cycle time of costly launch vehicles for a major aerospace defense contractor. Each of these launch vehicles consists of about 50,000 parts, has a long, complex fabrication cycle, and is very expensive to manufacture. The work involved first extracting the parts’ information (including lead times) as well as the assembly information from an MRP system, generating a bill of material (with the BOM procedure in SAS/OR), and then using the CPM procedure to reduce the cycle time from 4 years to 10 months.
- A large ship builder used CPM to plan, track, and update the scheduling of the construction of cruise ships as part of an integrated application catering to multiple user personas. Each ship is made up of several blocks that need to be assembled together. Each block can weigh between 300 to 700 tons and is made up of several sections. The reservation schedule identifies when and where each section needs to be placed in order to assemble the different blocks and when and where each block needs to be placed before it will be part of a ship. Scheduling with CPM provided a way to surface the current schedule at different levels of granularity and to monitor and adjust the progress of the project.
- A major auto manufacturer used CPM as the central component of an application designed to schedule and track the various activities involved in bringing a new vehicle from design to regular production. The scope involved detailed planning, updating, and baseline management. The resulting analysis was used for early detection, elevation, and management of issues.

As one can see, Elmaghraby’s influence on the people at SAS and on the SAS software has been far reaching and long lasting. Here is a final thought from Dr. Kulkarni: “Professor Elmaghraby certainly left a significant mark on the scheduling components of SAS/OR. As SAS/OR became more prominent in the operations research arena, Professor Elmaghraby continued to take deep pride in its success. I will miss him as a mentor and an example of someone who never tired of learning and teaching new concepts!”

I would like to thank Gehan Corea (SAS Advanced Analytics Senior Manager) and Lindsey Puryear (SAS Principal Operations Research Specialist) for their contributions to this post. In addition to the CPM procedure, SAS/OR contains other project management capabilities, including Earned Value Management macros that use the CPM procedure for earned value analysis, as well as the GANTT and NETDRAW procedures for visualization.

The post Dr. Salah Elmaghraby and the Evolution of SAS/OR Resource-Constrained Scheduling appeared first on Operations Research with SAS.

]]>The post Computing an Optimal Blackjack Strategy with SAS/OR appeared first on Operations Research with SAS.

]]>

In 1975, Manson, Barr, and Goodnight derived an optimal strategy for four-deck blackjack. In this demo, I'll instead use an “infinite-deck” approximation:

- each “ten” (10 or face card) appears with probability 4/13
- each non-ten card appears with probability 1/13
- ignore cards already dealt

The linear programming (LP) solver is used twice in one PROC OPTMODEL call:

- To derive the conditional probabilities of the dealer attaining 17, 18, 19, 20, 21, or busting, given the dealer’s up card, and
- To determine the player’s optimal strategy (hit, stand, split, or double down) for each possible hand

and dealer’s up card.

The dealer’s strategy is fixed (hit until reaching a total of 17 or more) and is modeled as a Markov process.

Let \(p_{ij}\) be the transition probability from state \(i\) to state \(j\), and let \(\pi_{ij}\) be the absorption probability that the dealer eventually reaches state \(j\), starting from state \(i\). Then \(\pi\) satisfies the following system of linear equations, where \(D\) is the dealer's state space:

\(\begin{align*} \pi_{ij} &= \sum_{k} p_{ik} \pi_{kj} &&\text{for } i \in D,\ j \in D\\ \pi_{ii} &= 1 &&\text{for } i \in \{17,18,19,20,21,\text{bust}\} \\ \sum_{j} \pi_{ij} &= 1 &&\text{for } i \in D\end{align*}\)

You can declare the decision variables, constraints, and dummy objective (specifying an objective is optional in SAS/OR 13.2) in PROC OPTMODEL as follows:

var Pi {TOTALS, TOTALS} >= 0 <= 1; con Recurrence {i in HITTING_TOTALS, j in TOTALS}: Pi[i,j] = sum {k in TOTALS} p[i,k] * Pi[k,j]; con Absorbing {i in STANDING_TOTALS}: Pi[i,i] = 1; con SumToOne {i in TOTALS}: sum {j in TOTALS} Pi[i,j] = 1; min Objective1 = 0; |

Now the SOLVE statement invokes the LP solver and instantly yields the following dealer absorption probabilities \(\pi_{ij}\), where each row corresponds to the dealer's up card \(i\) and each column corresponds to the dealer's final total \(j\):

If you stare at these numbers long enough, you start to see some patterns, but the patterns become even more obvious if you instead use PROC SGPLOT to display this table as a heat map in which darker blue corresponds to higher probabilities:

For example, when the dealer's up card is small, the most likely outcome is that the dealer will bust. Also, note that a dealer's up card of \(i \in \{7,\dots,10\}\) is most likely to yield a dealer total of \(i+10\), and this matches the conventional wisdom to assume that the dealer's down card is a ten.

Armed with the dealer absorption probabilities, we can now move to the player's strategy, which is modeled as a Markov decision problem. Unlike the dealer, the player has several choices of actions, and the transition probabilities depend on the action. Let \(p_{st}(a)\) be the transition probability from state \(s\) to state \(t\) if player takes action \(a \in A_s\), and define value function \(V(s)\) as the player's expected profit under optimal play, starting in state \(s\). Then \(V\) satisfies Bellman's equation:

\(\displaystyle{V(s) = \max\limits_{a \in A_s} \sum\limits_{t} p_{st}(a) V(t) \quad \text{for $s \in S$}}\),

which we can again model by using linear programming (think of \(\max\) as the least upper bound):

\(\begin{align*} &\text{minimize} &\sum_s V(s) \\ &\text{subject to} & V(s) &\ge \sum_{t} p_{st}(a) V(t) &&s \in S,\ a \in A_s \\ & & V(s) &\text{ free} &&s \in S \end{align*}\)

This LP formulation applies to all Markov decision problems with finite state and action spaces, not just blackjack. The corresponding PROC OPTMODEL code is as follows, where the constraints are grouped by action and \(d\) indicates if the player is (1) or is not (0) allowed to double down:

fix Pi; drop Recurrence Absorbing SumToOne; /* some lines omitted here */ var V {PLAYER_TOTALS, UP_CARDS, 0..1}; min Objective2 = sum {i in PLAYER_TOTALS, u in UP_CARDS, d in 0..1} V[i,u,d]; con Hit {i in NONPAIRS2, u in UP_CARDS, d in 0..1}: V[i,u,d] >= sum {c in CARDS} q[c] * V[pt_nonpairs[i,c],u,0]; con Stand {i in NONPAIRS2, u in UP_CARDS, d in 0..1}: V[i,u,d] >= sum {j in STANDING_TOTALS: points[j] < points[i]} Pi[u,j] - sum {j in STANDING_TOTALS: points[j] > points[i]} Pi[u,j]; con Double {i in NONPAIRS2, u in UP_CARDS}: V[i,u,1] >= 2 * sum {c in CARDS} q[c] * ( sum {j in STANDING_TOTALS: points[j] < points[pt_nonpairs[i,c]]} Pi[u,j] - sum {j in STANDING_TOTALS: points[j] > points[pt_nonpairs[i,c]] or pt_nonpairs[i,c] = 'bust'} Pi[u,j] ); con Split {i in PAIRS, u in UP_CARDS, d in 0..1}: V[i,u,d] >= if i ne 'pair1' then 2 * sum {c in CARDS} q[c] * V[pt_split[i,c],u,1] else 2 * sum {c in CARDS} q[c] * ( sum {j in STANDING_TOTALS: points[j] < points[pt_split[i,c]]} Pi[u,j] - sum {j in STANDING_TOTALS: points[j] > points[pt_split[i,c]]} Pi[u,j] ); con Nosplit {i in PAIRS, u in UP_CARDS, d in 0..1}: V[i,u,d] >= V[pt_nosplit[i],u,d]; con Bust {u in UP_CARDS, d in 0..1}: V['bust',u,d] = -1; |

A second SOLVE statement invokes the LP solver, which instantly returns the optimal solution. The following heat map shows the resulting optimal values of \(V(i,u,1)\), where dark red is good for the player and dark blue is bad for the player. Each row corresponds to the player's hand \(i\), and each column corresponds to the dealer's up card \(u\).

Again, several patterns are evident, but this imposing chart of expected profits defies memorization and is not practical for a player to use at the blackjack table. Instead, what the player really needs to know is the optimal action to take in each state, and this information is captured by the dual variables, whose values are accessible through the .DUAL constraint suffix. Explicitly, a positive dual value indicates that you should take the corresponding action. The following PROC OPTMODEL code prints the optimal strategy:

num epsilon = 1e-4; print '(Y = split, else do not split)'; print {i in PAIRS, u in UP_CARDS} ( if Split[i,u,1].dual > epsilon then 'Y' else ''); print '(D = double down, H = hit, else stand)'; print {i in NONPAIRS2, u in UP_CARDS} ( if Double[i,u].dual > epsilon then 'D' else if Hit[i,u,1].dual > epsilon then 'H' else ''); print '(H = hit, else stand)'; print {i in NONPAIRS2, u in UP_CARDS} ( if Hit[i,u,0].dual > epsilon then 'H' else ''); |

The optimal strategy for pairs is:

The optimal strategy if you can double down is:

The optimal strategy if you cannot double down is:

With enough practice, these tables are easy to memorize. In fact, casino gift shops sell wallet-sized versions of them, presumably to boost the confidence of would-be gamblers who might otherwise be intimidated by the pressure of making the correct decision only by gut feeling.

For more discussion and the full SAS code, see the SAS Global Forum 2011 paper Linear Optimization in SAS/OR® Software: Migrating to the OPTMODEL Procedure, which includes a calculation of the player's overall expected profit per hand. Spoiler alert: if you don't count cards, the expected profit is negative, but only slightly.

The post Computing an Optimal Blackjack Strategy with SAS/OR appeared first on Operations Research with SAS.

]]>