Every beginning SAS programmer learns the simple IF-THEN/ELSE statement for conditional processing in the SAS DATA step. The basic If-THEN statement handles two cases: if a condition is true, the program does one thing, otherwise the program does something else. Of course, you can handle more cases by using multiple
Tag: Getting Started
A grid is a set of evenly spaced points. You can use SAS to create a grid of points on an interval, in a rectangular region in the plane, or even in higher-dimensional regions like the parallelepiped shown at the left, which is generated by three vectors. You can use
SAS software can fit many different kinds of regression models. In fact a common question on the SAS Support Communities is "how do I fit a <name> regression model in SAS?" And within that category, the most frequent questions involve how to fit various logistic regression models in SAS. There
In a previous post I showed how to download, install, and use packages in SAS/IML 14.1. SAS/IML packages incorporate source files, documentation, data sets, and sample programs into a ZIP file. The PACKAGE statement enables you to install, uninstall, and manage packages. You can load functions and data into your
Descriptive univariate statistics are the foundation of data analysis. Before you create a statistical model for new data, you should examine descriptive univariate statistics such as the mean, standard deviation, quantiles, and the number of nonmissing observations. In SAS, there is an easy way to create a data set that
A dummy variable (also known as indicator variable) is a numeric variable that indicates the presence or absence of some level of a categorical variable. The word "dummy" does not imply that these variables are not smart. Rather, dummy variables serve as a substitute or a proxy for a categorical
In the SAS/IML language, you can read data from a SAS data set into a set of vectors (each with their own name) or into a single matrix. Beginning programmers might wonder about the advantages of each approach. When should you read data into vectors? When should you read data
Novice SAS programmers quickly learn the advantages of using PROC SORT to sort data, followed by a BY-group analysis of the sorted data. A typical example is to analyze demographic data by state or by ZIP code. A BY statement enables you to produce multiple analyses from a single procedure
A moving average (also called a rolling average) is a statistical technique that is used to smooth a time series. Moving averages are used in finance, economics, and quality control. You can overlay a moving average curve on a time series to visualize how each value compares to a rolling
Parameters in SAS procedures are specified a list of values that you manually type into the procedure syntax. For example, if you want to specify a list of percentile values in PROC UNIVARIATE, you need to type the values into the PCTLPTS= option as follows: proc univariate data=sashelp.cars noprint; var
Weighted averages are all around us. Teachers use weighted averages to assign a test more weight than a quiz. Schools use weighted averages to compute grade-point averages. Financial companies compute the return on a portfolio as a weighted average of the component assets. Financial charts show (linearly) weighted moving averages
Suppose that you are tabulating the eye colors of students in a small class (following Friendly, 1992). Depending upon the ethnic groups of these students, you might not observe any green-eyed students. How do you put a 0 into the table that summarizes the number of students who have each
You've had a long day. You've implemented a custom algorithm in the SAS/IML language. But before you go home, you want to generate some matrices and test your program. If you are like me, you prefer a short statement—one line would be best. However, you also want the flexibility to
Dear Rick, I have a data set with 1,001 numerical variables. One variable is the response, the others are explanatory variable. How can I read the 1,000 explanatory variables into an IML matrix without typing every name? That's a good question. You need to be able to perform two sub-tasks:
When using SAS to format a number as a percentage, there is a little trick that you need to remember: the width of the formatted value must include room for the decimal point, the percent sign, and the possibility of two parentheses that indicate negative values. The field width must
Base SAS contains many functions for processing strings, and you can call these functions from within a SAS/IML program. However, sometimes a SAS/IML programmer needs to process a vector of strings. No problem! You can call most Base SAS functions with a vector of parameters. I have previously written about
A SAS programmer wanted to plot the normal distribution and highlight the area under curve that corresponds to the tails of the distribution. For example, the following plot shows the lower decile shaded in blue and the upper decile shaded in red. An easy way to do this in SAS
As my colleague Margaret Crevar recently wrote, it is useful to know how long SAS programs take to run. Margaret and others have written about how to use the SAS FULLSTIMER option to monitor the performance of the SAS system. In fact, SAS distributes a macro that enables you to
When I am computing with SAS/IML matrices and vectors, I often want to label the columns or rows so that I can better understand the data. The labels are called headers, and the COLNAME= and ROWNAME= options in the SAS/IML PRINT statement enable you to add headers for columns and
One of the fundamental principles of computer programming is to break a task into smaller subtasks and to modularize the program by encapsulating each subtask into its own function. I have written many blog posts over the years about how to define and use functions in the SAS/IML language. I
A SAS programmer asked for a list of SAS/IML functions that operate on the columns of an n x p matrix and return a 1 x p row vector of results. The functions that behave this way tend to compute univariate descriptive statistics such as the mean, median, standard deviation, and quantiles. The following
I previously wrote about the best way to suppress output from SAS procedures. Suppressing output is necessary in simulation and bootstrap analyses, and it is useful in other contexts as well. In my previous article, I wrote, "many programmers use ODS _ALL_ CLOSE as a way to suppress output, but
A common task in data analysis is to locate observations that satisfy multiple criteria. For example, you might want to locate all zip codes in certain counties within specified states. The SAS DATA step contains the powerful WHERE statement, which enables you to extract a subset of data that satisfy
Did you know that if you have set multiple titles in SAS, that there is an easy way to remove them? For example, suppose that you've written the following statements, which call the TITLE statement to set three titles: title "A Great Big Papa Title"; title2 "A Medium-sized Mama Title";
A customer asked: How do we go about summing a finite series in SAS? For example, I want to compute for various integers n ≥ 3. I want to output two columns, one for the natural numbers and one for the summation of the series. Summations arise often in statistical
I often blog about the usefulness of vectorization in the SAS/IML language. A one-sentence summary of vectorization is "execute a small number of statements that each analyze a lot of data." In general, for matrix languages (SAS/IML, MATLAB, R, ...) vectorization is more efficient than the alternative, which is to
Last week I received a message from SAS Technical Support saying that a customer's IML program was running slowly. Could I look at it to see whether it could be improved? What I discovered is a good reminder about the importance of vectorizing user-defined modules. The program in this blog
A SAS/IML programmer asked a question on a discussion forum, which I paraphrase below: I've written a SAS/IML function that takes several arguments. Some of the arguments have default values. When the module is called, I want to compute some quantity, but I only want to compute it for the
In my book Simulating Data with SAS, I discuss a relationship between the skewness and kurtosis of probability distributions that might not be familiar to some statistical programmers. Namely, the skewness and kurtosis of a probability distribution are not independent. If κ is the full kurtosis of a distribution and
In the SAS DATA step, all variables are scalar quantities. Consequently, an IF-THEN/ELSE statement that evaluates a logical expression is unambiguous. For example, the following DATA step statements print "c=5 is TRUE" to the log if the variable c is equal to 5: if c=5 then put "c=5 is TRUE";