# Author

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

0
How to choose a seed for generating random numbers in SAS

Last week I was asked a simple question: "How do I choose a seed for the random number functions in SAS?" The answer might surprise you: use any seed you like. Each seed of a well-designed random number generator is likely to give rise to a stream of random numbers,

Programming Tips
0
On the SMOOTHCONNECT option in the SERIES statement

By default, when you use the SERIES statement in PROC SGPLOT to create a line plot, the observations are connected (in order) by straight line segments. However, SAS 9.4m1 introduced the SMOOTHCONNECT option which, as the name implies, uses a smooth curve to connect the observations. In Sanjay Matange's blog,

0
Sample quantiles: A comparison of 9 definitions

According to Hyndman and Fan ("Sample Quantiles in Statistical Packages," TAS, 1996), there are nine definitions of sample quantiles that commonly appear in statistical software packages. Hyndman and Fan identify three definitions that are based on rounding and six methods that are based on linear interpolation. This blog post shows

0
Quantile definitions in SAS

In last week's article about the Flint water crisis, I computed the 90th percentile of a small data set. Although I didn't mention it, the value that I reported is different from the the 90th percentile that is reported in Significance magazine. That is not unusual. The data only had

0
Quantiles and the Flint water crisis

The April 2017 issue of Significance magazine features a cover story by Robert Langkjaer-Bain about the Flint (Michigan) water crisis. For those who don't know, the Flint water crisis started in 2014 when the impoverished city began using the Flint River as a source of city water. The water was

0
Simulate lognormal data in SAS

A SAS customer asked how to simulate data from a three-parameter lognormal distribution as specified in the PROC UNIVARIATE documentation. In particular, he wanted to incorporate a threshold parameter into the simulation. Simulating lognormal data is easy if you remember an important fact: if X is lognormally distributed, then Y=log(X)

Programming Tips
0
Timeline of living US presidents

Quick! What is the next term in the numerical sequence 1, 2, 1, 2, 3, 4, 5, 4, 3, 4, ...? If you said '3', then you must be an American history expert, because that sequence represents the number of living US presidents beginning with Washington's inauguration on 30APR1789 and

Data Visualization
0
Perceptions of probability

If a financial analyst says it is "likely" that a company will be profitable next year, what probability would you ascribe to that statement? If an intelligence report claims that there is "little chance" of a terrorist attack against an embassy, should the ambassador interpret this as a one-in-a-hundred chance,

Analytics
0
Split data into groups that have the same mean and variance

A frequently asked question on SAS discussion forums concerns randomly assigning units (often patients in a study) to various experimental groups so that each group has approximately the same number of units. This basic problem is easily solved in SAS by using PROC SURVEYSELECT or a DATA step program. A

0
Visualize a design matrix

Most SAS regression procedures support a CLASS statement which internally generates dummy variables for categorical variables. I have previously described what dummy variables are and how are they used. I have also written about how to create design matrices that contain dummy variables in SAS, and in particular how to

0
Visualize an ANOVA with two-way interactions

There are several ways to visualize data in a two-way ANOVA model. Most visualizations show a statistical summary of the response variable for each category. However, for small data sets, it can be useful to overlay the raw data. This article shows a simple trick that you can use to

0
Regression with restricted cubic splines in SAS

Restricted cubic splines are a powerful technique for modeling nonlinear relationships by using linear regression models. I have attended multiple SAS Global Forum presentations that show how to use restricted cubic splines in SAS regression procedures. However, the presenters have all used the %RCSPLINE macro (Frank Harrell, 1988) to generate

0
Visualize the 68-95-99.7 rule in SAS

A reader commented on last week's article about constructing symmetric intervals. He wanted to know if I created it in SAS. Yes, the graph, which illustrates the so-called 68-95-99.7 rule for the normal distribution, was created by using several statements in the SGPLOT procedure in Base SAS The SERIES statement

0
Quadratic optimization in SAS

At SAS Global Forum last week, I saw a poster that used SAS/IML to optimized a quadratic objective function that arises in financial portfolio management (Xia, Eberhardt, and Kastin, 2017). The authors used the Newton-Raphson optimizer (NLPNRA routine) in SAS/IML to optimize a hypothetical portfolio of assets. The Newton-Raphson algorithm

Programming Tips
0
A simple trick to construct symmetric intervals

Many intervals in statistics have the form p ± δ, where p is a point estimate and δ is the radius (or half-width) of the interval. (For example, many two-sided confidence intervals have this form, where δ is proportional to the standard error.) Many years ago I wrote an article

Programming Tips
0
Nonsmooth models and spline effects

Most regression models try to model a response variable by using a smooth function of the explanatory variables. However, if the data are generated from some nonsmooth process, then it makes sense to use a regression function that is not smooth. A simple way to model a discontinuous process in

Programming Tips
0
Print tables in SAS/IML

One of the advantages of the new mixed-type tables in SAS/IML 14.2 (released with SAS 9.4m4) is the greatly enhanced printing functionality. You can control which rows and columns are printed, specify formats for individual columns, and even use templates to completely customize how tables are printed. Printing a table

Programming Tips
0
Lists: Nonmatrix data structures in SAS/IML

Lists are collections of objects. SAS/IML 14.2 supports lists as a way to store matrices, data tables, and other lists in a single object that you can pass to functions. SAS/IML lists automatically grow if you add new items to them and shrink if you remove items. You can also

Programming Tips
0
Data tables: Nonmatrix data structures in SAS/IML

Prior to SAS/IML 14.2, every variable in the Interactive Matrix Language (IML) represented a matrix. That changed when SAS/IML 14.2 (released with SAS 9.4m4) introduced two new data structures: data tables and lists. This article gives an overview of data tables. I will blog about lists in a separate article.

Programming Tips
0
Find your birthday in the digits of pi

It is time for Pi Day, 2017! Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. This year I use SAS software to show an amazing fact: you can find your birthday

Programming Tips
0
Find a pattern in a sequence of digits

I recently needed to solve a fun programming problem. I challenge other SAS programmers to solve it, too! The problem is easy to state: Given a long sequence of digits, can you write a program to count how many times a particular subsequence occurs? For example, if I give you

Programming Tips
0
Display output conditionally with PROC DOCUMENT

Suppose you have several discrete variables. You want to conduct a frequency analysis of these variables and print the results, but ONLY for variables that have three or more levels. In other words, you want to conditionally display some results, but you don't know which variables satisfy the condition until