Most SAS regression procedures support a CLASS statement which internally generates dummy variables for categorical variables. I have previously described what dummy variables are and how are they used. I have also written about how to create design matrices that contain dummy variables in SAS, and in particular how to

# Author

There are several ways to visualize data in a two-way ANOVA model. Most visualizations show a statistical summary of the response variable for each category. However, for small data sets, it can be useful to overlay the raw data. This article shows a simple trick that you can use to

Restricted cubic splines are a powerful technique for modeling nonlinear relationships by using linear regression models. I have attended multiple SAS Global Forum presentations that show how to use restricted cubic splines in SAS regression procedures. However, the presenters have all used the %RCSPLINE macro (Frank Harrell, 1988) to generate

A reader commented on last week's article about constructing symmetric intervals. He wanted to know if I created it in SAS. Yes, the graph, which illustrates the so-called 68-95-99.7 rule for the normal distribution, was created by using several statements in the SGPLOT procedure in Base SAS The SERIES statement

At SAS Global Forum last week, I saw a poster that used SAS/IML to optimized a quadratic objective function that arises in financial portfolio management (Xia, Eberhardt, and Kastin, 2017). The authors used the Newton-Raphson optimizer (NLPNRA routine) in SAS/IML to optimize a hypothetical portfolio of assets. The Newton-Raphson algorithm

Many intervals in statistics have the form p ± δ, where p is a point estimate and δ is the radius (or half-width) of the interval. (For example, many two-sided confidence intervals have this form, where δ is proportional to the standard error.) Many years ago I wrote an article

Most regression models try to model a response variable by using a smooth function of the explanatory variables. However, if the data are generated from some nonsmooth process, then it makes sense to use a regression function that is not smooth. A simple way to model a discontinuous process in

One of the advantages of the new mixed-type tables in SAS/IML 14.2 (released with SAS 9.4m4) is the greatly enhanced printing functionality. You can control which rows and columns are printed, specify formats for individual columns, and even use templates to completely customize how tables are printed. Printing a table

Lists are collections of objects. SAS/IML 14.2 supports lists as a way to store matrices, data tables, and other lists in a single object that you can pass to functions. SAS/IML lists automatically grow if you add new items to them and shrink if you remove items. You can also

Did you know that you can check a SAS macro variable to see if ODS graphics is enabled? The other day I wanted to write a SAS program that creates a graph only if ODS graphics is enabled. The solution is to check the SYSODSGRAPHICS macro variable, which is automatically

Prior to SAS/IML 14.2, every variable in the Interactive Matrix Language (IML) represented a matrix. That changed when SAS/IML 14.2 (released with SAS 9.4m4) introduced two new data structures: data tables and lists. This article gives an overview of data tables. I will blog about lists in a separate article.

SAS formats are very useful and can be used in a myriad of creative ways. For example, you can use formats to display decimal values as a fraction. However, SAS supports so many formats that it is difficult to remember details about the format syntax, such as the default field

SAS programmers who have experience with other programming languages sometimes wonder whether the SAS language supports statements that are equivalent to the "break" and "continue" statements in other languages. The answer is yes. The LEAVE statement in the SAS DATA step is equivalent to the "break" statement. It provides a

It is time for Pi Day, 2017! Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. This year I use SAS software to show an amazing fact: you can find your birthday

I recently needed to solve a fun programming problem. I challenge other SAS programmers to solve it, too! The problem is easy to state: Given a long sequence of digits, can you write a program to count how many times a particular subsequence occurs? For example, if I give you

Suppose you have several discrete variables. You want to conduct a frequency analysis of these variables and print the results, but ONLY for variables that have three or more levels. In other words, you want to conditionally display some results, but you don't know which variables satisfy the condition until

After reading my article about how to use BY-group processing to run 1000 regression models, a SAS programmer asked whether it is possible to reorder the output of a BY-group analysis. The answer is yes: you can use the DOCUMENT procedure to replay a portion of your output in any

Monte Carlo techniques have many applications, but a primary application is to approximate the probability that some event occurs. The idea is to simulate data from the population and count the proportion of times that the event occurs in the simulated data. For continuous univariate distributions, the probability of an

Longtime SAS programmers know that the SAS DATA step and SAS procedures are very tolerant of typographical errors. You can misspell most keywords and SAS will "guess" what you mean. For example, if you mistype "PROC" as "PRC," SAS will run the program but write a warning to the log:

Sometimes SAS programmers ask about how to analyze quantiles with SAS. Common questions include: How can I compute 95% confidence intervals for a median in SAS? How can I test whether the medians of two independent samples are significantly different? How can I repeat the previous analyses with other percentiles,

Many introductory courses in probability and statistics encourage students to collect and analyze real data. A popular experiment in categorical data analysis is to give students a bag of M&M® candies and ask them to estimate the proportion of colors in the population from the sample data. In some classes,

A categorical response variable can take on k different values. If you have a random sample from a multinomial response, the sample proportions estimate the proportion of each category in the population. This article describes how to construct simultaneous confidence intervals for the proportions as described in the 1997 paper

A common question on SAS discussion forums is how to repeat an analysis multiple times. Most programmers know that the most efficient way to analyze one model across many subsets of the data (perhaps each country or each state) is to sort the data and use a BY statement to

On discussion forums, I often see questions that ask how to Winsorize variables in SAS. For example, here are some typical questions from the SAS Support Community: I want an efficient way of replacing (upper) extreme values with (95th) percentile. I have a data set with around 600 variables and

Suppose you create a scatter plot in SAS with PROC SGPLOT. What color does PROC SGPLOT use for the markers? If you specify the GROUP= option so that markers are colored by a grouping variable, what colors are used to represent the various groups? The following scatter plot shows the

In a previous article, I showed how to simulate data for a linear regression model with an arbitrary number of continuous explanatory variables. To keep the discussion simple, I simulated a single sample with N observations and p variables. However, to use Monte Carlo methods to approximate the sampling distribution

If you are a SAS programmer and use the GROUP= option in PROC SGPLOT, you might have encountered a thorny issue: if you use a WHERE clause to omit certain observations, then the marker colors for groups might change from one plot to another. This happens because the marker colors

This article shows how to simulate a data set in SAS that satisfies a least squares regression model for continuous variables. When you simulate to create "synthetic" (or "fake") data, you (the programmer) control the true parameter values, the form of the model, the sample size, and magnitude of the

The SAS analytical documentation has a new look. Beginning with the 14.2 release of the SAS analytical products (which shipped with SAS 9.4m4 in November 2016), the HTML version of the online documentation has moved to a new framework called the Help Center. The URL for the online documentation is

This article shows how to solve mixed integer linear programming (MILP) problems in SAS. In a mixed integer problem, some of the variables in the problem are integer-valued whereas others are continuous. The objective function is a linear function of the variables and the variables can be subject to linear