This article shows how to randomly access data in a SAS data set by using the READ POINT statement in SAS/IML software. I have previously discussed how to use the READ NEXT and READ CURRENT statements to sequentially access each observation in a SAS data set from PROC IML.
Reading a Specified Observation: The READ POINT Statement
In SAS/IML software, you can directly access rows of data by using the READ POINT statement. (The usage is similar to the POINT= option in the SET statement of the SAS DATA step, except in the SAS/IML language you do not need to use a STOP statement.) The value after the POINT keyword can be a scalar or a matrix of values.
In the following program, the USE statement opens the SasHelp.Class data set for reading. The DO statement loops five times. At each iteration, the variable r contains a valid row number. The READ POINT statement reads the specified observation for the EngineSize variable into a scalar SAS/IML variable with the same name. You could then do something with that observation, such as predict the gas consumption for that vehicle.
proc iml; /** show how to use READ POINT **/ p = {139, 250, 80, 388, 185}; use sashelp.cars; do i = 1 to 5; r = p[i]; read point r var {EngineSize}; /** compute with this observation **/ end; |
It is not strictly necessary to create the temporary variable, r. The READ statement could be written more concisely as follows:
read point (p[i]) var {EngineSize}; |
Reading All Specified Observations at the Same Time
Actually, there is no need for the DO loop in the previous program. The SAS/IML language accepts a vector of values for the argument to the POINT option. In other words, you can ask for the values of the observations enumerated in p with a single statement:
/** get a vector of values **/ read point p var {EngineSize}; print p EngineSize; |
In conclusion, the READ POINT statement enables random access to observations in a data set. This means that you can read the observations in any order. This article also shows that you can read the observations one at a time within a loop, or all at once with a single statement. In general, random access is not as efficient as sequential access of a data set, so sequential access is preferred for most applications.
1 Comment
Pingback: Reading big data in the SAS/IML language - The DO Loop