In many programming languages, there is a function named eval() that can be used to evaluate an expression and return the result at run time. For example, in Python, the eval() function parses the expression passed to it and runs a Python expression or code within the program. Even Python eval() supports more optional parameters, such as global/local dictionary for runtime context, but the goal for evaluation remains the same.
When an expression is stored as a character variable of an SAS observation, it suggests the running context is the SAS program execution phase, so the internal data structure like Input Buffer (IB), Program Data Vectors (PDV) and Descriptor Information of the output database are all accessible. Furthermore, all SAS system functions are also accessible in an expression, just like the raw SAS code exposed to SAS compiler and runtime. In the following SAS data set, what the user wants is to get the real value from the expression in variable C. It is different from the commonly used Calculated Column, which uses the same computing rule to expand a new variable for all observations. Here the column C has a different computing rule for each observation; the user's expected result is the column D.
%let MVAR=2; data a; length a b 8 c $ 255; a=3; b=4; c='a**2+b**2'; output; /* Arithmetic or Logical */ a=7; b=22; c='b/a * &MVAR'; output; /* SAS Macro Var*/ a=113; b=355; c='cos(b/a)'; output; /* Trigonometric */ a=0; b=1; c='cdf("NORMAL", 0, a, b)'; output; /* Probability */ run; proc print;run;
What solutions ahead?
Someone might want a solution that parses the expression in variable C, and then try to rebuild the abstract syntax tree and try to evaluate it from bottom to top. I can’t say this solution is totally wrong but it’s very complex and too hard to generate a general solution. If we have an SAS function eval() in DATA Step, then we can easily use the following code to achieve the goal. Unfortunately, SAS did not provide the eval() function.
data b; set a; d=eval(c); run;
SAS provides Macro function %eval, %sysevalf to evaluate arithmetic and logical expressions using integer or floating-point arithmetic. SAS also provides function resolve() (not supported in the DATA Step that runs in CAS) to return the resolved value of the argument after it has been processed by the macro facility. Anyway, resolve() can’t access the values of macro variable assigned by symput() or symputn() at program execution phase. So, for the SAS code as below, it outputs c=150 and d=200. For simple arithmetic and logical expression, we can use the tranwrd() function to replace all variable names with real values, and then use resolve('%sysevalf('|| cats(expression_with_real_values) || ')') to get evaluated result, but it limited to only SAS functions supported by %eval() and %sysevalf() macro functions.
%let a=100; %let b=50; data _null_; call symput("b", 200); c=resolve("%eval(&a+&b)"); put c=; d=symget("b"); put d=; run;
Now let’s return to the key question we raised: How can we implement the eval() function in SAS Data Step? The best solution we found so far is to use the dynamic code generation mechanism in SAS, and then submit the code to SAS for parsing and execution. In this way, we retrieve string expression from a variable to a real expression in SAS code successfully. We don’t care what the valid SAS expression is, we totally convert it to the code snippets and submit it for execution. Syntax check, reference to PDV variables, internal functions, and even SAS Macro variables are all supported. Yes, it’s a smart and concise implementation for general purposes.
Due to each observation having a different computing rule, we need to use _N_ to control the observation mapping. So, for the existing dataset A, we can use the following code to achieve the goal. The key secret is to use CALL EXECUTE to generate SAS code dynamically and delay execution after the code is ready. In the output dataset B, we have a numeric column D with the value evaluated with the character expression from column C. You can see the output on the RIGHT part of the Figure 1.
data _null_; set a end=last; if _n_=1 then call execute("data b; set a;"); call execute( "if _N_=" || compress(_N_) || " then d=" || trim(c) || "; "); if last then call execute("run;"); run; proc print; run;
Wrap up as a reusable SAS macro
We also can wrap up the above SAS code in a reusable SAS macro, so this logic can be used anywhere you want. The %evalvar macro has four arguments: ds= and var= are the input dataset and columns with expression to be evaluated. And outds= and outvar= are the output dataset and columns to store result. We also can specify same value for ds= and outds=, so the user just expands the existing dataset with an additional evaluated column.
%macro EvalVar(ds=, var=, outds=, outvar=); data _null_; set &ds end=last; if _n_=1 then call execute("data &outds; set &ds;"); call execute( "if _n_=" || compress(_n_) || " then &outvar=" || &var || ";"); if last then call execute("run;"); run; %mend; %EvalVar(ds=a, var=c, outds=b, outvar=d); proc print; run;
By the way, I also had a temporary SAS code generation solution implemented via the %include SAS macro. The temporary SAS code will be deleted automatically when the SAS session is closed. The sample code is also attached here for your information.
filename tmpfile temp; data _null_; file tmpfile; set a end=last; if _N_=1 then put "data b; set a;"; put "if _N_ =" _N_ "then d=" c ";"; if last then put "run;"; run; %include tmpfile; proc print; run;
If we need to evaluate SAS expressions against a big table dynamically, we need to take consideration of computational complexity. We should use random (direct) access to dataset instead of using _N_ condition to improve performance and turn off some SAS system options when we submit large number of SAS codes at runtime, e.g. option nomprint nomlogic nosource;
%macro EvalVarEx(ds=, var=, outds=, outvar=); data _null_; set &ds end=last; if _n_=1 then call execute("data &outds;"); call execute( "_i_=" || compress(_n_) || "; set &ds point=_i_;"); %let i=1; %let v=%scan(&var, &i); %let y=%scan(&outvar, &i); %do %while( &v ^=); if not missing(&v) then call execute("&y="|| &v || ";"); %let i=%eval(&i+1); %let v=%scan(&var, &i); %let y=%scan(&outvar, &i); %end; call execute("output; "); if last then call execute("stop; run;"); run; %mend; data a; length a b 8 c d $ 255; do i=1 to 20000; a=3; b=4; c='a**2+b**2'; d='a+b'; output; end; drop i; run; option nomprint nomlogic nosource; %EvalVarEx(ds=a, var=%str(c,d), outds=b, outvar=%str(e,f)); proc print data=a (obs=100); run; proc print data=b (obs=100); run;
In this article, we talk about how to evaluate SAS expressions in Data Step dynamically. The expression parsing and execution are totally handled by SAS at program execution phase. It avoids handling abstract syntax tree parsing and evaluation ourselves on it. We introduce two dynamic code generation implementations via call execute or %include. We also use _N_ to control observation mapping due to each observation has different computing rules. This implementation can reflect the beauty of simplicity in SAS before SAS provides system function eval() one day.