If you have worked with the different types of score code generated by the high-performance modeling nodes in SAS® Enterprise Miner™ 14.1, you have probably come across the Analytic Store (or ASTORE) file type for scoring. The ASTOREfile type works very well for scoring complex machine learning models like random forests, gradient boosting, support vector machines and others. In this article, we will focus on ASTORE files generated by SAS® Viya® Visual Data Mining and Machine Learning (VDMML) procedures. An introduction to analytic stores on SAS Viya can be found here.
In this post, we will:
- Generate an ASTORE file for a gradient boosting model in SAS Visual Data Mining and Machine Learning.
- Override the scoring decision captured in the ASTORE file in step 1 using PROC ASTORE in SAS Visual Data Mining and Machine Learning.
Generate an ASTORE file for a gradient boosting model
Our example dataset is a distributed in-memory CAS table that contains information about applicants who were granted credit for a certain home equity loan. The categorical binary-valued target variable ‘BAD’ identifies if a client either defaulted or repaid their loan. The remainder of the variables indicating the candidate’s credit history, debt-to-income ratio, occupation, etc., are used as predictors for the model. In the code below, we are training a gradient boosting model on a randomly sampled 70% of the data and validating against 30% of the data. The statement SAVESTATE creates an analytic store file (ASTORE) for the model and saves it as a binary file named “astore_gb.”
proc gradboost data=PUBLIC.HMEQ; partition fraction(validate=0.3); target BAD / level=nominal; input LOAN MORTDUE DEBTINC VALUE YOJ DEROG DELINQ CLAGE NINQ CLNO / level=interval; input REASON JOB / level=nominal; score out=public.hmeq_scored copyvars=(_all_); savestate rstore=public.astore_gb; id _all_; run;
Shown below are a few observations from the scored dataset hmeq_scored where YOJ (years at present job) is greater than 10 years.
Override the scoring decision using PROC ASTORE
In this segment, we will use PROC ASTORE to override the scoring decision from the gradient boosting model. To that end, we will first make use of the DESCRIBE statement in PROC ASTORE to produce basic DS2 scoring code using the EPCODE option. We will then edit the score code in DS2 language syntax to override the scoring decision produced from the gradient boosting model.
proc astore; describe rstore=public.astore_gb epcode="/viyafiles/jukhar/gb_epcode.sas"; run;
A snapshot of the output from the above code statements are shown below. The analytic store is assigned to a unique string identifier. We also get information about the analytic engine that produced the store (gradient boosting, in this case) and the time when the store was created. In addition, though not shown in the snapshot below, we get a list of the input and output variables used.
Let’s take a look at the DS2 score code (“gb_epcode.sas”) produced by the EPCODE option in the DESCRIBE statement within PROC ASTORE.
data sasep.out; dcl package score sc(); dcl double "LOAN"; dcl double "MORTDUE"; dcl double "DEBTINC"; dcl double "VALUE"; dcl double "YOJ"; dcl double "DEROG"; dcl double "DELINQ"; dcl double "CLAGE"; dcl double "NINQ"; dcl double "CLNO"; dcl nchar(7) "REASON"; dcl nchar(7) "JOB"; dcl double "BAD"; dcl double "P_BAD1" having label n'Predicted: BAD=1'; dcl double "P_BAD0" having label n'Predicted: BAD=0'; dcl nchar(32) "I_BAD" having label n'Into: BAD'; dcl nchar(4) "_WARN_" having label n'Warnings'; Keep "P_BAD1" "P_BAD0" "I_BAD" "_WARN_" "BAD" "LOAN" "MORTDUE" "VALUE" "REASON" "JOB" "YOJ" "DEROG" "DELINQ" "CLAGE" "NINQ" "CLNO" "DEBTINC" ; varlist allvars[_all_]; method init(); sc.setvars(allvars); sc.setKey(n'F8E7B0B4B71C8F39D679ECDCC70F6C3533C21BD5'); end; method preScoreRecord(); end; method postScoreRecord(); end; method term(); end; method run(); set sasep.in; preScoreRecord(); sc.scoreRecord(); postScoreRecord(); end; enddata;
The sc.setKey in the method init () method block contains a string identifier for the analytic store; this is the same ASTORE identifier that was previously outputted as part of PROC ASTORE. In order to override the scoring decision created from the original gradient boosting model, we will edit the gb_epcode.sas file (shown above) by inserting new statements in the postScoreRecord method block; the edited file must follow DS2 language syntax. For more information about the DS2 language, see SAS DS2 Language Reference.
Let’s assume for the sake of simplicity that we want to force the predicted outcome (“I_BAD”) to be ‘0’ for the cases where YOJ (number of years at present job) is greater than 10 years. We will store this outcome in a new variable called “I_BAD_NEW”. The code below shows you the edited postScoreRecord method block.
method postScoreRecord(); if YOJ>10 then do; I_BAD_NEW='0'; end; else do; I_BAD_NEW=I_BAD; end; end;
Because we are saving the outcome into a new variable called “I_BAD_NEW,” we will need to declare this variable upfront along with the rest of the variables in the score file.
In order for this override to take effect, we will need to run the SCORE statement in PROC ASTORE and provide both the original ASTORE file (astore_gb), as well as the edited DS2 score code (gb_epcode.sas).
proc astore; score data=public.hmeq epcode="/viyafiles/jukhar/gb_epcode.sas" rstore=public.astore_gb out=public.hmeq_new; run;
A comparison of “I_BAD” and “I_BAD_NEW” in the output of the above code for select variables shows that the override rule for scoring has indeed taken place.
In this article we explored how to override the scoring decision produced from a machine learning model in SAS Viya. You will find more information about scoring in the SAS Visual Data Mining and Machine Learning user guide.