Create lists by using a natural syntax in SAS/IML

1

SAS/IML 14.3 (SAS 9.4M5) introduced a new syntax for creating lists and for assigning and extracting item in a list. Lists (introduced in SAS/IML 14.2) are data structures that are convenient for holding heterogeneous data. A single list can hold character matrices, numeric matrices, scalar values, and other lists, as discussed in a previous article about how to use lists in SAS/IML.

The list creation operator

You can use square brackets to create a list. The elements of the list are separated by commas. For example, the following syntax creates a list that contains three elements. The list represents a hypothetical patient in a cholesterol-lowering study.

proc iml;
/* 1. New list creation syntax (L = [val1, val2,...]) */
/*                 Cholesterol at
       Name  Age   0mo  3mo  6mo  12mo */
P1 = ["Bob",  36, {182, 170, 170, 162}]; /* Patient 1 */

The name of the list is P1. The first element of the list is a string, the second is a scalar number, and the third is a numeric vector. In this case, the elements are specified by using literal values, but you can also use previously defined variables. The following statement is equivalent but defines the list by using existing variables:

Name = "Bob"; Age = 36; Chol = {182, 170, 170, 162};
P1 = [Name, Age, Chol];   /* define list from copies of existing variables */

As mentioned earlier, lists can contain other lists. For example, if you have multiple patients in a study, you can create a list of each patient's data, then create a list that contains all the patients' data, as follows:

P2 = ["Fred", 52, {175, 165, 155}     ]; /* Patient 2 */
P3 = ["Tom",  45, {160, 145,   ., 139}]; /* Patient 3 */
Patients = [P1, P2, P3];                 /* a list of patients */

Assign and extract list items

You can use the list item operator ($) to specify an item in a list. For each patient, the age is stored as the second item in a list. If you want the age of Bob (the first patient), you can use either of the following statements:

/* 2. New list item syntax ($) */
BobAge = P1$2;         /* get 2nd item from P1 list */
BobAge = Patients$1$2; /* get 1st item of Patients, then 2nd item from that list */

The first statement is straightforward: P1$2 means "get the second item from the P1 list." The second statement is parsed left to right. The syntax Patient$1 means "get the first item from the Patients list," which is equivalent to P1. Thus Patient$1$2 gets Bob's age.

The preceding example uses literal values to specify the position of an item in a list, but you can also use a variable. This makes it possible to extract items in a loop. For example, the following statements loop over all patients, extract the name and cholesterol values of the patient, and compute each patient's average cholesterol value during the study:

N = ListLen(Patients);      /* number of patients */
Name = j(N,1,"     ");      /* allocate vector for names */
AvgChol = j(N,1,.);         /* allocate vector for results */
do i = 1 to N;
   Name[i] = Patients$i$1;  /* get name of i_th patient */
   Chol = Patients$i$3;     /* get sequence of cholesterol values */
   AvgChol[i] = mean(Chol); /* average value */
end;
print AvgChol[rowname=Name];

You can use the list item operator on the left side of an assignment operator to change an item in a list. For example, suppose you discover that Bob's age was typed wrong: Bob is really 63, not 36. You can update Bob's data as follows:

P1$2 = 63;         /* update 2nd item in P1 list */
Patients$1 = P1;   /* update entire patient list */

Alternatively, you could use the syntax Patients$1$2 = 63 to update the data in place.

Extract sublists

To subset a list, use square brackets ([]) and specify the indices of the items. For example, the following statement extracts the second and third patients into a new list:

/* 3. Extract sublist SL = L[ {i1, i2,...} ] */
SubList = Patients[ {2 3} ]; /* Fred and Tom */

The sublist operator ([]) ALWAYS returns a list, even if you specify only one item! Thus Patients[1] is a LIST that contains one item. If you want the item itself, use Patients$1.

Named lists

In the previous example, the items in the list P1 are a patient's name, age, and cholesterol readings. If you want to extract Bob's age, you can write P1$2, but someone unfamiliar with the order of the list items would have no idea what that item represents. Thus it is helpful to define a named list, which is also called an associative array. When you specify a named list, you specify the items by using name-value pairs, as follows:

P = [#"Name" = "Bob",                        /* P$1 or P$"Name" */
     #"Age"  = 63,                           /* P$2 or P$"Age" */
     #"Cholesterol" = {182, 170, 170, 162}]; /* P$3 or P$"Cholesterol" */

You can use the names to refer to the items in a list. For example, the following statements extract the patient's name and cholesterol readings by using the list item operator:

Name = P$"Name";
Chol = P$"Cholesterol";       /* get Bob's measurements */
print Chol[Label=Name];

You can also use names in the sublist operator to extract a sublist:

L = P[ {"Age" "Cholesterol"} ];  /* sublist that contains two items */

Summary

In summary, SAS/IML 14.3 contains new syntax for creating a list and for extracting items and sublists. This syntax makes it easier to use lists and to read and write SAS/IML programs that use lists.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

1 Comment

  1. Peter Lancashire on

    Thanks. This looks useful. The syntax for associative arrays is not as easy and elegant as awk, which was developed in 1977.

Leave A Reply

Back to Top