Last week I showed a timeline of living US presidents. The number of living presidents is constant during the time interval between inaugurations and deaths of presidents. The data was taken from a Wikipedia table (shown below) that shows the number of years and days between events. This article shows
Author
A SAS customer asked how to simulate data from a three-parameter lognormal distribution as specified in the PROC UNIVARIATE documentation. In particular, he wanted to incorporate a threshold parameter into the simulation. Simulating lognormal data is easy if you remember an important fact: if X is lognormally distributed, then Y=log(X)
Quick! What is the next term in the numerical sequence 1, 2, 1, 2, 3, 4, 5, 4, 3, 4, ...? If you said '3', then you must be an American history expert, because that sequence represents the number of living US presidents beginning with Washington's inauguration on 30APR1789 and
If a financial analyst says it is "likely" that a company will be profitable next year, what probability would you ascribe to that statement? If an intelligence report claims that there is "little chance" of a terrorist attack against an embassy, should the ambassador interpret this as a one-in-a-hundred chance,
A frequently asked question on SAS discussion forums concerns randomly assigning units (often patients in a study) to various experimental groups so that each group has approximately the same number of units. This basic problem is easily solved in SAS by using PROC SURVEYSELECT or a DATA step program. A
Most SAS regression procedures support a CLASS statement which internally generates dummy variables for categorical variables. I have previously described what dummy variables are and how are they used. I have also written about how to create design matrices that contain dummy variables in SAS, and in particular how to
There are several ways to visualize data in a two-way ANOVA model. Most visualizations show a statistical summary of the response variable for each category. However, for small data sets, it can be useful to overlay the raw data. This article shows a simple trick that you can use to
Restricted cubic splines are a powerful technique for modeling nonlinear relationships by using linear regression models. I have attended multiple SAS Global Forum presentations that show how to use restricted cubic splines in SAS regression procedures. However, the presenters have all used the %RCSPLINE macro (Frank Harrell, 1988) to generate
A reader commented on last week's article about constructing symmetric intervals. He wanted to know if I created it in SAS. Yes, the graph, which illustrates the so-called 68-95-99.7 rule for the normal distribution, was created by using several statements in the SGPLOT procedure in Base SAS The SERIES statement
At SAS Global Forum last week, I saw a poster that used SAS/IML to optimized a quadratic objective function that arises in financial portfolio management (Xia, Eberhardt, and Kastin, 2017). The authors used the Newton-Raphson optimizer (NLPNRA routine) in SAS/IML to optimize a hypothetical portfolio of assets. The Newton-Raphson algorithm
Many intervals in statistics have the form p ± δ, where p is a point estimate and δ is the radius (or half-width) of the interval. (For example, many two-sided confidence intervals have this form, where δ is proportional to the standard error.) Many years ago I wrote an article
Most regression models try to model a response variable by using a smooth function of the explanatory variables. However, if the data are generated from some nonsmooth process, then it makes sense to use a regression function that is not smooth. A simple way to model a discontinuous process in
One of the advantages of the new mixed-type tables in SAS/IML 14.2 (released with SAS 9.4m4) is the greatly enhanced printing functionality. You can control which rows and columns are printed, specify formats for individual columns, and even use templates to completely customize how tables are printed. Printing a table
Lists are collections of objects. SAS/IML 14.2 supports lists as a way to store matrices, data tables, and other lists in a single object that you can pass to functions. SAS/IML lists automatically grow if you add new items to them and shrink if you remove items. You can also
Did you know that you can check a SAS macro variable to see if ODS graphics is enabled? The other day I wanted to write a SAS program that creates a graph only if ODS graphics is enabled. The solution is to check the SYSODSGRAPHICS macro variable, which is automatically
Prior to SAS/IML 14.2, every variable in the Interactive Matrix Language (IML) represented a matrix. That changed when SAS/IML 14.2 (released with SAS 9.4m4) introduced two new data structures: data tables and lists. This article gives an overview of data tables. I will blog about lists in a separate article.
SAS formats are very useful and can be used in a myriad of creative ways. For example, you can use formats to display decimal values as a fraction. However, SAS supports so many formats that it is difficult to remember details about the format syntax, such as the default field
SAS programmers who have experience with other programming languages sometimes wonder whether the SAS language supports statements that are equivalent to the "break" and "continue" statements in other languages. The answer is yes. The LEAVE statement in the SAS DATA step is equivalent to the "break" statement. It provides a
It is time for Pi Day, 2017! Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. This year I use SAS software to show an amazing fact: you can find your birthday
I recently needed to solve a fun programming problem. I challenge other SAS programmers to solve it, too! The problem is easy to state: Given a long sequence of digits, can you write a program to count how many times a particular subsequence occurs? For example, if I give you
Suppose you have several discrete variables. You want to conduct a frequency analysis of these variables and print the results, but ONLY for variables that have three or more levels. In other words, you want to conditionally display some results, but you don't know which variables satisfy the condition until
After reading my article about how to use BY-group processing to run 1000 regression models, a SAS programmer asked whether it is possible to reorder the output of a BY-group analysis. The answer is yes: you can use the DOCUMENT procedure to replay a portion of your output in any
Monte Carlo techniques have many applications, but a primary application is to approximate the probability that some event occurs. The idea is to simulate data from the population and count the proportion of times that the event occurs in the simulated data. For continuous univariate distributions, the probability of an
Longtime SAS programmers know that the SAS DATA step and SAS procedures are very tolerant of typographical errors. You can misspell most keywords and SAS will "guess" what you mean. For example, if you mistype "PROC" as "PRC," SAS will run the program but write a warning to the log:
Sometimes SAS programmers ask about how to analyze quantiles with SAS. Common questions include: How can I compute 95% confidence intervals for a median in SAS? How can I test whether the medians of two independent samples are significantly different? How can I repeat the previous analyses with other percentiles,
Many introductory courses in probability and statistics encourage students to collect and analyze real data. A popular experiment in categorical data analysis is to give students a bag of M&M® candies and ask them to estimate the proportion of colors in the population from the sample data. In some classes,
A categorical response variable can take on k different values. If you have a random sample from a multinomial response, the sample proportions estimate the proportion of each category in the population. This article describes how to construct simultaneous confidence intervals for the proportions as described in the 1997 paper
A common question on SAS discussion forums is how to repeat an analysis multiple times. Most programmers know that the most efficient way to analyze one model across many subsets of the data (perhaps each country or each state) is to sort the data and use a BY statement to
On discussion forums, I often see questions that ask how to Winsorize variables in SAS. For example, here are some typical questions from the SAS Support Community: I want an efficient way of replacing (upper) extreme values with (95th) percentile. I have a data set with around 600 variables and
Suppose you create a scatter plot in SAS with PROC SGPLOT. What color does PROC SGPLOT use for the markers? If you specify the GROUP= option so that markers are colored by a grouping variable, what colors are used to represent the various groups? The following scatter plot shows the