Longtime SAS programmers know that the SAS DATA step and SAS procedures are very tolerant of typographical errors. You can misspell most keywords and SAS will "guess" what you mean. For example, if you mistype "PROC" as "PRC," SAS will run the program but write a warning to the log:

# Author

Sometimes SAS programmers ask about how to analyze quantiles with SAS. Common questions include: How can I compute 95% confidence intervals for a median in SAS? How can I test whether the medians of two independent samples are significantly different? How can I repeat the previous analyses with other percentiles,

Many introductory courses in probability and statistics encourage students to collect and analyze real data. A popular experiment in categorical data analysis is to give students a bag of M&M® candies and ask them to estimate the proportion of colors in the population from the sample data. In some classes,

A categorical response variable can take on k different values. If you have a random sample from a multinomial response, the sample proportions estimate the proportion of each category in the population. This article describes how to construct simultaneous confidence intervals for the proportions as described in the 1997 paper

A common question on SAS discussion forums is how to repeat an analysis multiple times. Most programmers know that the most efficient way to analyze one model across many subsets of the data (perhaps each country or each state) is to sort the data and use a BY statement to

On discussion forums, I often see questions that ask how to Winsorize variables in SAS. For example, here are some typical questions from the SAS Support Community: I want an efficient way of replacing (upper) extreme values with (95th) percentile. I have a data set with around 600 variables and

Suppose you create a scatter plot in SAS with PROC SGPLOT. What color does PROC SGPLOT use for the markers? If you specify the GROUP= option so that markers are colored by a grouping variable, what colors are used to represent the various groups? The following scatter plot shows the

In a previous article, I showed how to simulate data for a linear regression model with an arbitrary number of continuous explanatory variables. To keep the discussion simple, I simulated a single sample with N observations and p variables. However, to use Monte Carlo methods to approximate the sampling distribution

If you are a SAS programmer and use the GROUP= option in PROC SGPLOT, you might have encountered a thorny issue: if you use a WHERE clause to omit certain observations, then the marker colors for groups might change from one plot to another. This happens because the marker colors

This article shows how to simulate a data set in SAS that satisfies a least squares regression model for continuous variables. When you simulate to create "synthetic" (or "fake") data, you (the programmer) control the true parameter values, the form of the model, the sample size, and magnitude of the

The SAS analytical documentation has a new look. Beginning with the 14.2 release of the SAS analytical products (which shipped with SAS 9.4m4 in November 2016), the HTML version of the online documentation has moved to a new framework called the Help Center. The URL for the online documentation is

This article shows how to solve mixed integer linear programming (MILP) problems in SAS. In a mixed integer problem, some of the variables in the problem are integer-valued whereas others are continuous. The objective function is a linear function of the variables and the variables can be subject to linear

For SAS programmers, the PUT statement in the DATA step and the %PUT macro statement are useful statements that enable you to display the values of variables and macro variables, respectively. By default, the output appears in the SAS log. This article shares a few tips that help you to

Last week I wrote about the 10 most popular articles from The DO Loop in 2016. The popular articles tend to be about elementary topics that appeal to a wide range of SAS programmers. Today I present an "editor's choice" list of technical articles that describe more advanced statistical methods

In the beginning SAS created procedures and output. The output was formless and void. Then SAS said, "Let there be ODS," and there was ODS. Customers saw that ODS was good, and SAS separated the computation from the display and management of output. The preceding paragraph oversimplifies the SAS Output

“La Quinta” is Spanish for “next to Denny’s.” -- Mitch Hedberg, comedian Mitch Hedberg's joke resonates with travelers who drive on the US interstate system because many highway exits feature both a La Quinta Inn™ and a Denny's® restaurant within a short distance of each other. But does a

*The DO Loop*in 2016

I wrote 105 posts for The DO Loop blog in 2016. My most popular articles were about data analysis, SAS programming tips, and elementary statistics. Without further ado, here are the most popular articles from 2016. Data Analysis and Visualization Start with a juicy set of data and an interesting

How can you generate data that contains outliers in a simulation study? The contaminated normal distribution is a simple but useful distribution you can use to simulate outliers. The distribution is easy to explain and understand, and it is also easy to implement in SAS. What is a contaminated normal

In some applications, you need to optimize a linear objective function of many variables, subject to linear constraints. Solving this problem is called linear programming or linear optimization. This article shows two ways to solve linear programming problems in SAS: You can use the OPTMODEL procedure in SAS/OR software or

Out of the bosom of the Air, Out of the cloud-folds of her garments shaken, Over the woodlands brown and bare, Over the harvest-fields forsaken, Silent, and soft, and slow Descends the snow. "Snow-flakes" by Henry Wadsworth Longfellow Happy holidays to all my readers! In my last post I showed

I have a fondness for fractals. In previous articles, I've used SAS to create some of my favorite fractals, including a fractal Christmas tree and the "devil's staircase" (Cantor ) function. Because winter is almost here, I think it is time to construct the Koch snowflake fractal in SAS. A

Many SAS procedure compute statistics and also compute confidence intervals for the associated parameters. For example, PROC MEANS can compute the estimate of a univariate mean, and you can use the CLM option to get a confidence interval for the population mean. Many parametric regression procedures (such as PROC GLM)

A recent issue of Astronomy magazine mentioned Kepler's third law of planetary motion, which states "the square of a planet's orbital period is proportional to the cube of its average distance from the Sun" (Astronomy, Dec 2016, p. 17). The article included a graph (shown at the right) that shows

Do you want to create customized SAS graphs by using PROC SGPLOT and the other ODS graphics procedures? An essential skill that you need to learn is how to merge, join, append, and concatenate SAS data sets that come from different sources. The SAS statistical graphics procedures (SG procedures) enable

In the classic textbook by Johnson and Wichern (Applied Multivariate Statistical Analysis, Third Edition, 1992, p. 164), it says: All measures of goodness-of-fit suffer the same serious drawback. When the sample size is small, only the most aberrant behaviors will be identified as lack of fit. On the other hand,

Somewhere in my past I encountered a panel of histograms for small random samples of normal data. I can't remember the source, but it might have been from John Tukey or William Cleveland. The point of the panel was to emphasize that (because of sampling variation) a small random sample

A SAS customer asked how to use background colors and a dashed line to emphasize the forecast region for a graph that shows a time series model. The task requires the following steps: Use the ATTRPRIORITY=NONE option on the ODS GRAPHICS statement to make sure that the current ODS style

At a conference last week, a presenter showed SAS statements that compute the logarithm of a probability density function (PDF). The log-PDF is a a common computation because it occurs when maximizing the log-likelihood function. The presenter computed the expression in SAS by using an expression that looked like y

Who was the oldest person elected president of the United States? How about the youngest? Who was the oldest when he left office? Let's look at some data. Wikipedia has a page that presents a table of the presidents of the US by age. It lists the dates for which

If you obtain data from web sites, social media, or other unstandardized data sources, you might not know the form of dates in the data. For example, the US Independence Day might be represented as "04JUL1776", "07/04/1776", "Jul 4, 1776", or "July 4, 1776." Fortunately, the ANYDTDTE informat makes it