If you toss a coin 28 times, you would not be surprised to see three heads in a row, such as ...THHHTH.... But what about eight heads in a row? Would a sequence such as THHHHHHHHTH... be a rare event? This question popped into my head last weekend as I

# Author

Last week I was asked a simple question: "How do I choose a seed for the random number functions in SAS?" The answer might surprise you: use any seed you like. Each seed of a well-designed random number generator is likely to give rise to a stream of random numbers,

By default, when you use the SERIES statement in PROC SGPLOT to create a line plot, the observations are connected (in order) by straight line segments. However, SAS 9.4m1 introduced the SMOOTHCONNECT option which, as the name implies, uses a smooth curve to connect the observations. In Sanjay Matange's blog,

According to Hyndman and Fan ("Sample Quantiles in Statistical Packages," TAS, 1996), there are nine definitions of sample quantiles that commonly appear in statistical software packages. Hyndman and Fan identify three definitions that are based on rounding and six methods that are based on linear interpolation. This blog post shows

In last week's article about the Flint water crisis, I computed the 90th percentile of a small data set. Although I didn't mention it, the value that I reported is different from the the 90th percentile that is reported in Significance magazine. That is not unusual. The data only had

The April 2017 issue of Significance magazine features a cover story by Robert Langkjaer-Bain about the Flint (Michigan) water crisis. For those who don't know, the Flint water crisis started in 2014 when the impoverished city began using the Flint River as a source of city water. The water was

Last week I showed a timeline of living US presidents. The number of living presidents is constant during the time interval between inaugurations and deaths of presidents. The data was taken from a Wikipedia table (shown below) that shows the number of years and days between events. This article shows

A SAS customer asked how to simulate data from a three-parameter lognormal distribution as specified in the PROC UNIVARIATE documentation. In particular, he wanted to incorporate a threshold parameter into the simulation. Simulating lognormal data is easy if you remember an important fact: if X is lognormally distributed, then Y=log(X)

Quick! What is the next term in the numerical sequence 1, 2, 1, 2, 3, 4, 5, 4, 3, 4, ...? If you said '3', then you must be an American history expert, because that sequence represents the number of living US presidents beginning with Washington's inauguration on 30APR1789 and

If a financial analyst says it is "likely" that a company will be profitable next year, what probability would you ascribe to that statement? If an intelligence report claims that there is "little chance" of a terrorist attack against an embassy, should the ambassador interpret this as a one-in-a-hundred chance,

A frequently asked question on SAS discussion forums concerns randomly assigning units (often patients in a study) to various experimental groups so that each group has approximately the same number of units. This basic problem is easily solved in SAS by using PROC SURVEYSELECT or a DATA step program. A

Most SAS regression procedures support a CLASS statement which internally generates dummy variables for categorical variables. I have previously described what dummy variables are and how are they used. I have also written about how to create design matrices that contain dummy variables in SAS, and in particular how to

There are several ways to visualize data in a two-way ANOVA model. Most visualizations show a statistical summary of the response variable for each category. However, for small data sets, it can be useful to overlay the raw data. This article shows a simple trick that you can use to

Restricted cubic splines are a powerful technique for modeling nonlinear relationships by using linear regression models. I have attended multiple SAS Global Forum presentations that show how to use restricted cubic splines in SAS regression procedures. However, the presenters have all used the %RCSPLINE macro (Frank Harrell, 1988) to generate

A reader commented on last week's article about constructing symmetric intervals. He wanted to know if I created it in SAS. Yes, the graph, which illustrates the so-called 68-95-99.7 rule for the normal distribution, was created by using several statements in the SGPLOT procedure in Base SAS The SERIES statement

At SAS Global Forum last week, I saw a poster that used SAS/IML to optimized a quadratic objective function that arises in financial portfolio management (Xia, Eberhardt, and Kastin, 2017). The authors used the Newton-Raphson optimizer (NLPNRA routine) in SAS/IML to optimize a hypothetical portfolio of assets. The Newton-Raphson algorithm

Many intervals in statistics have the form p ± δ, where p is a point estimate and δ is the radius (or half-width) of the interval. (For example, many two-sided confidence intervals have this form, where δ is proportional to the standard error.) Many years ago I wrote an article

Most regression models try to model a response variable by using a smooth function of the explanatory variables. However, if the data are generated from some nonsmooth process, then it makes sense to use a regression function that is not smooth. A simple way to model a discontinuous process in

One of the advantages of the new mixed-type tables in SAS/IML 14.2 (released with SAS 9.4m4) is the greatly enhanced printing functionality. You can control which rows and columns are printed, specify formats for individual columns, and even use templates to completely customize how tables are printed. Printing a table

Lists are collections of objects. SAS/IML 14.2 supports lists as a way to store matrices, data tables, and other lists in a single object that you can pass to functions. SAS/IML lists automatically grow if you add new items to them and shrink if you remove items. You can also

Did you know that you can check a SAS macro variable to see if ODS graphics is enabled? The other day I wanted to write a SAS program that creates a graph only if ODS graphics is enabled. The solution is to check the SYSODSGRAPHICS macro variable, which is automatically

Prior to SAS/IML 14.2, every variable in the Interactive Matrix Language (IML) represented a matrix. That changed when SAS/IML 14.2 (released with SAS 9.4m4) introduced two new data structures: data tables and lists. This article gives an overview of data tables. I will blog about lists in a separate article.

SAS formats are very useful and can be used in a myriad of creative ways. For example, you can use formats to display decimal values as a fraction. However, SAS supports so many formats that it is difficult to remember details about the format syntax, such as the default field

SAS programmers who have experience with other programming languages sometimes wonder whether the SAS language supports statements that are equivalent to the "break" and "continue" statements in other languages. The answer is yes. The LEAVE statement in the SAS DATA step is equivalent to the "break" statement. It provides a

It is time for Pi Day, 2017! Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. This year I use SAS software to show an amazing fact: you can find your birthday

I recently needed to solve a fun programming problem. I challenge other SAS programmers to solve it, too! The problem is easy to state: Given a long sequence of digits, can you write a program to count how many times a particular subsequence occurs? For example, if I give you

Suppose you have several discrete variables. You want to conduct a frequency analysis of these variables and print the results, but ONLY for variables that have three or more levels. In other words, you want to conditionally display some results, but you don't know which variables satisfy the condition until

After reading my article about how to use BY-group processing to run 1000 regression models, a SAS programmer asked whether it is possible to reorder the output of a BY-group analysis. The answer is yes: you can use the DOCUMENT procedure to replay a portion of your output in any

Monte Carlo techniques have many applications, but a primary application is to approximate the probability that some event occurs. The idea is to simulate data from the population and count the proportion of times that the event occurs in the simulated data. For continuous univariate distributions, the probability of an

Longtime SAS programmers know that the SAS DATA step and SAS procedures are very tolerant of typographical errors. You can misspell most keywords and SAS will "guess" what you mean. For example, if you mistype "PROC" as "PRC," SAS will run the program but write a warning to the log: