Blogs

Blogs

Tag: Statistical Programming

Learn SAS | Programming Tips

Rick WicklinFebruary 19, 2020 0

A list of SAS DATA step functions that do not run in CAS

Are you a statistical programmer whose company has adopted SAS Viya? If so, you probably know that the DATA step can run in parallel in SAS Cloud Analytic Services (CAS). As Sekosky (2017) says, "running in a single thread in SAS is different from running in many threads in CAS."

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJanuary 27, 2020 0

The Johnson SU distribution

The Johnson system (Johnson, 1949) contains a family of four distributions: the normal distribution, the lognormal distribution, the SB distribution (which models bounded distributions), and the SU distribution (which models unbounded distributions). Note that 'B' stands for 'bounded' and 'U' stands for 'unbounded.' A previous article explains the purpose of

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJanuary 20, 2020 0

The Johnson SB distribution

From the early days of probability and statistics, researchers have tried to organize and categorize parametric probability distributions. For example, Pearson (1895, 1901, and 1916) developed a system of seven distributions, which was later called the Pearson system. The main idea behind a "system" of distributions is that for each

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinNovember 25, 2019 0

Evaluate a quadratic polynomial in SAS

What is an efficient way to evaluate a multivariate quadratic polynomial in p variables? The answer is to use matrix computations! A multivariate quadratic polynomial can be written as the sum of a purely quadratic term (degree 2), a purely linear term (degree 1), and a constant term (degree 0).

Read More

Analytics | Data Visualization

Rick WicklinOctober 16, 2019 0

Visualize a regression with splines

The EFFECT statement is supported by more than a dozen SAS/STAT regression procedures. Among other things, it enables you to generate spline effects that you can use to fit nonlinear relationships in data. Recently there was a discussion on the SAS Support Communities about how to interpret the parameter estimates

Read More

Analytics | Programming Tips

Rick WicklinOctober 2, 2019 0

Compute the geometric mean, geometric standard deviation, and geometric CV in SAS

I frequently see questions on SAS discussion forums about how to compute the geometric mean and related quantities in SAS. Unfortunately, the answers to these questions are sometimes confusing or even wrong. In addition, some published papers and web sites that claim to show how to calculate the geometric mean

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinSeptember 3, 2019 0

Cosine similarity of vectors

An important application of the dot product (inner product) of two vectors is to determine the angle between the vectors. If u and v are two vectors, then cos(θ) = (u ⋅ v) / (|u| |v|) You could apply the inverse cosine function if you wanted to find θ in

Read More

Programming Tips

Rick WicklinAugust 26, 2019 0

Conditionally append observations to a SAS data set

Most SAS programmers know how to use PROC APPEND or the SET statement in DATA step to unconditionally append new observations to an existing data set. However, sometimes you need to scan the data to determine whether or not to append observations. In this situation, many SAS programmers choose one

Read More

Learn SAS | Programming Tips

Rick WicklinAugust 19, 2019 0

Timing performance in SAS/IML: Built-in functions versus Base SAS functions

One of my friends likes to remind me that "there is no such thing as a free lunch," which he abbreviates by "TINSTAAFL" (or TANSTAAFL). The TINSTAAFL principle applies to computer programming because you often end up paying a cost (in performance) when you call a convenience function that simplifies

Read More

Learn SAS | Programming Tips

Rick WicklinAugust 14, 2019 0

Short-circuit evaluation and logical ligatures in SAS

Many programmers are familiar with "short-circuit" evaluation in an IF-THEN statement. Short circuit means that a program does not evaluate the remainder of a logical expression if the value of the expression is already logically determined. The SAS DATA step supports short-circuiting for simple logical expressions in IF-THEN statements and

Read More

Learn SAS | Programming Tips

Rick WicklinJuly 31, 2019 0

Use numeric values for column headers when printing a matrix

Sometimes a little thing can make a big difference. I am enjoying a new enhancement of SAS/IML 15.1, which enables you to use a numeric vector as the column header or row header when you print a SAS/IML matrix. Prior to SAS/IML 15.1, you had to use the CHAR or

Read More

Learn SAS | Programming Tips

Rick WicklinJuly 24, 2019 0

Implement the Gumbel distribution in SAS

SAS supports more than 25 common probability distributions for the PDF, CDF, QUANTILE, and RAND functions. Of course, there are infinitely many distributions, so not every possible distribution is supported. If you need a less-common distribution, I've shown how to extend the functionality of Base SAS (by using PROC FCMP)

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinJune 26, 2019 0

Jump-start PROC LOGISTIC by using parameter estimates from PROC HPLOGISTIC

SAS/STAT software contains a number of so-called HP procedures for training and evaluating predictive models. ("HP" stands for "high performance.") A popular HP procedure is HPLOGISTIC, which enables you to fit logistic models on Big Data. A goal of the HP procedures is to fit models quickly. Inferential statistics such

Read More

Analytics | Programming Tips

Rick WicklinMay 20, 2019 0

Critical values of the Kolmogorov-Smirnov test

Recently I wrote about how to compute the Kolmogorov D statistic, which is used to determine whether a sample has a particular distribution. One of the beautiful facts about modern computational statistics is that if you can compute a statistic, you can use simulation to estimate the sampling distribution of

Read More

Analytics | Learn SAS

Rick WicklinMay 15, 2019 0

What is Kolmogorov's D statistic?

Have you ever run a statistical test to determine whether data are normally distributed? If so, you have probably used Kolmogorov's D statistic. Kolmogorov's D statistic (also called the Kolmogorov-Smirnov statistic) enables you to test whether the empirical distribution of data is different than a reference distribution. The reference distribution

Read More

Analytics | Learn SAS

Rick WicklinMay 1, 2019 0

Encodings of CLASS variables in SAS regression procedures: A cheat sheet

SAS regression procedures support several parameterizations of classification variables. When a categorical variable is used as an explanatory variable in a regression model, the procedure generates dummy variables that are used to construct a design matrix for the model. The process of forming columns in a design matrix is called

Read More

Learn SAS | Programming Tips

Rick WicklinApril 29, 2019 0

The normal mixture distribution in SAS

Did you know that SAS provides built-in support for working with probability distributions that are finite mixtures of normal distributions? This article shows examples of using the "NormalMix" distribution in SAS and describes a trick that enables you to easily work with distributions that have many components. As with all

Read More

Programming Tips

Rick WicklinApril 17, 2019 0

Create your own version of Anscombe's quartet: Dissimilar data that have similar statistics

I think every course in exploratory data analysis should begin by studying Anscombe's quartet. Anscombe's quartet is a set of four data sets (N=11) that have nearly identical descriptive statistics but different graphical properties. They are a great reminder of why you should graph your data. You can read about

Read More

Learn SAS | Programming Tips

Rick WicklinApril 8, 2019 0

Use the FLOOR-MOD trick to allocate items to groups

Suppose you need to assign 100 patients equally among 3 treatment groups in a clinical study. Obviously, an equal allocation is impossible because the second number does not evenly divide the first, but you can get close by assigning 34 patients to one group and 33 to the others. Mathematically,

Read More

Learn SAS | Programming Tips

Rick WicklinApril 1, 2019 0

Matrix operations and BY groups

Many SAS procedures support the BY statement, which enables you to perform an analysis for subgroups of the data set. Although the SAS/IML language does not have a built-in "BY statement," there are various techniques that enable you to perform a BY-group analysis. The two I use most often are

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinFebruary 13, 2019 0

3 ways to obtain the Hessian at the MLE solution for a regression model

When you use maximum likelihood estimation (MLE) to find the parameter estimates in a generalized linear regression model, the Hessian matrix at the optimal solution is very important. The Hessian matrix indicates the local shape of the log-likelihood surface near the optimal value. You can use the Hessian to estimate

Read More

Analytics | Learn SAS

Rick WicklinFebruary 11, 2019 0

4 reasons to use PROC PLM for linear regression models in SAS

Have you ever run a regression model in SAS but later realize that you forgot to specify an important option or run some statistical test? Or maybe you intended to generate a graph that visualizes the model, but you forgot? Years ago, your only option was to modify your program

Read More

Learn SAS | Programming Tips

Parameter estimates for synthetic (simulated) data that follows a regression model.

Rick WicklinJanuary 28, 2019 0

Simulate data for a regression model with categorical and continuous variables

This article shows how to use SAS to simulate data that fits a linear regression model that has categorical regressors (also called explanatory or CLASS variables). Simulating data is a useful skill for both researchers and statistical programmers. You can use simulation for answering research questions, but you can also

Read More

Analytics | Programming Tips

Rick WicklinJanuary 23, 2019 0

Coding and simulating categorical variables in regression models

Recently I was asked to explain the result of an ANOVA analysis that I posted to a statistical discussion forum. My program included some simulated data for an ANOVA model and a call to the GLM procedure to estimate the parameters. I was asked why the parameter estimates from PROC

Read More

Analytics | Programming Tips

Rick WicklinJanuary 23, 2019 0

Coding and simulating categorical variables in regression models

Recently I was asked to explain the result of an ANOVA analysis that I posted to a statistical discussion forum. My program included some simulated data for an ANOVA model and a call to the GLM procedure to estimate the parameters. I was asked why the parameter estimates from PROC

Read More

Analytics | Machine Learning | Programming Tips

Partition data into training, validation, and testing in SAS

Rick WicklinJanuary 21, 2019 0

Create training, validation, and test data sets in SAS

In machine learning and other model building techniques, it is common to partition a large data set into three segments: training, validation, and testing. Training data is used to fit each model. Validation data is a random sample that is used for model selection. These data are used to select

Read More

Analytics | Programming Tips

Rick WicklinJanuary 16, 2019 0

Three ways to add a line to a Q-Q plot

A quantile-quantile plot (Q-Q plot) is a graphical tool that compares a data distribution and a specified probability distribution. If the points in a Q-Q plot appear to fall on a straight line, that is evidence that the data can be approximately modeled by the target distribution. Although it is

Read More

Analytics | Data Visualization | Programming Tips

Process flow diagram shows how to resample data to create a bootstrap distribution.

Rick WicklinJanuary 9, 2019 0

10 posts from 2018 that deserve a second look

Numbers don't lie, but sometimes they don't reveal the full story. Last week I wrote about the most popular articles from The DO Loop in 2018. The popular articles are inevitably about elementary topics in SAS programming or statistics because those topics have broad appeal. However, I also write about

Read More

Learn SAS | Programming Tips

Rick WicklinDecember 5, 2018 0

When is a histogram not a histogram? When it's a table!

Recently a SAS programmer wanted to obtain a table of counts that was based on a histogram. I showed him how you can use the OUTHIST= option on the HISTOGRAM statement in PROC UNIVARIATE to obtain that information. For example, the following call to PROC UNIVARIATE creates a histogram for

Read More

Analytics | Programming Tips

Rick WicklinOctober 3, 2018 0

Fast simulation of multivariate normal data with an AR(1) correlation structure

It is sometimes necessary for researchers to simulate data with thousands of variables. It is easy to simulate thousands of uncorrelated variables, but more difficult to simulate thousands of correlated variables. For that, you can generate a correlation matrix that has special properties, such as a Toeplitz matrix or a

Read More

Previous 1 … 4 5 6 7 8 … 15 Next