Teaching an AI assistant to read and write SAS IML vectors

0

One of the most exciting features of SAS Viya Workbench is that the code editor includes a generative AI component called SAS Viya Copilot. This feature was announced and demonstrated at SAS Innovate 2024. With the Copilot, you can specify a text prompt that generates SAS code. For example, you can enter a prompt such as "Sort data sets DS1 and DS2 by the ID variable, then write DATA step code that creates a new data set, DS_NEW, as a full join of the two data sets on ID." The Copilot will do its best to write SAS code that satisfies your prompt.

It is no secret that the current AI assistants are not perfect. They have been trained on a vast corpus of examples that are available on the internet, including blogs, discussion forums, documentation, and conference papers. Anyone who has ever read a discussion forum knows that some of the posted answers are good, but some are not so good.

Recently, I have been assessing the ability of the SAS Viya Copilot to write simple programs in the DATA step and in the SAS IML language. For some—but not all—prompts, the Copilot generates a correct SAS program. Of course, SAS (the company) wants to minimize the probability of incorrect programs, so I and others have been tasked with assessing the abilities of the Copilot. We identify prompts for which the Copilot produces a less-than-perfect response. For those prompts, we generate additional training examples so that the Copilot can learn and improve its responses.

During my investigations, I discovered that the Copilot did not correctly produce the SAS IML statements for reading and writing data. Namely, how to read data from SAS data sets into individual IML vectors. I assume that if Copilot can't write the correct SAS IML code, then there are probably plenty of humans who don't know the technique either! So, this article reviews how to read data directly into IML vectors. I also discuss the converse process: How to write data from IML vectors to a SAS data set.

Read data into IML vectors

The Copilot excelled at the task of reading data into an IML matrix. To read data into a matrix, you use the READ-INTO statement in PROC IML. For example, to read the numerical variables from the Sashelp.Class dataset into a matrix named 'A', you can use the following SAS IML statements:

proc iml;
use Sashelp.Class;
   read all var _NUM_ into A; /* _NUM_ is a keyword for "all numeric variables" */
close;

Notice the use of the INTO keyword, which is implies that you are creating a matrix that has multiple columns. Recall that the variables that you read INTO a matrix must all be the same type (numeric or character) because a SAS IML matrix can store only one type. You should use Tables if you want to read variables of multiple types.

Alternatively, you can read mixed-type data directly into vectors. To do this, do NOT use the INTO keyword. Instead, use the syntax
READ ALL VAR varnames
In the Sashelp.Class data, the 'Name' and 'Sex' variables are character, whereas the 'Age' and 'Height' variables are numeric. The following SAS IML program creates vectors that have the same names as the data set variables:

proc iml;
/* create vectors from a data set */
use Sashelp.Class;
   read all var {'Name' 'Sex' 'Age' 'Height'}; /* create four vectors */
close;
print Name Sex Age Height;
quit;

Write data from IML vectors

Again, the Copilot excelled at the task of writing data from an IML matrix but struggled to write data from IML vectors of mixed type. To write data from a matrix, you use the CREATE-FROM and APPEND-FROM syntax. However, to write data from vectors, do NOT use the FROM keyword. Instead, use the
CREATE DSName VAR varnames
statement followed by one or more APPEND statements.

The following SAS IML program is a possible response to the prompt, "Write an IML program that creates a SAS DATA set named 'Want' from two numeric vectors, x and y, and two character vectors, a and b."

proc iml;
x = {1,2,3};
y = {9,8,7};
a = {'Red','Green','Yellow'};
b = {'Dog','Lizard','Bird'};
 
/* create a data set from vectors */
create Want var {'x' 'y' 'a' 'b'};
append;
close;
quit;
 
proc print data=Want;
run;

Summary

Today's AI assistants are not perfect at writing SAS programs, but SAS is trying to improve that skill. This article reminds everyone (human and AI) that you can read/write data directly from/to IML vectors.

  • To read data, you use the READ ALL VAR varnames statement. Do NOT use the INTO keyword!
  • To write data, you use the CREATE DSName VAR varnames statement followed by one or more APPEND statements. Do NOT use the FROM keyword.

I, for one, welcome our AI overlords assistants. However, for now, they need our help learning how to write solid and effective SAS programs.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top