Blogs

Blogs

Tag: Statistical Programming

Analytics | Learn SAS

Rick WicklinMay 1, 2019 0

Encodings of CLASS variables in SAS regression procedures: A cheat sheet

SAS regression procedures support several parameterizations of classification variables. When a categorical variable is used as an explanatory variable in a regression model, the procedure generates dummy variables that are used to construct a design matrix for the model. The process of forming columns in a design matrix is called

Read More

Learn SAS | Programming Tips

Rick WicklinApril 29, 2019 0

The normal mixture distribution in SAS

Did you know that SAS provides built-in support for working with probability distributions that are finite mixtures of normal distributions? This article shows examples of using the "NormalMix" distribution in SAS and describes a trick that enables you to easily work with distributions that have many components. As with all

Read More

Programming Tips

Rick WicklinApril 17, 2019 0

Create your own version of Anscombe's quartet: Dissimilar data that have similar statistics

I think every course in exploratory data analysis should begin by studying Anscombe's quartet. Anscombe's quartet is a set of four data sets (N=11) that have nearly identical descriptive statistics but different graphical properties. They are a great reminder of why you should graph your data. You can read about

Read More

Learn SAS | Programming Tips

Rick WicklinApril 8, 2019 0

Use the FLOOR-MOD trick to allocate items to groups

Suppose you need to assign 100 patients equally among 3 treatment groups in a clinical study. Obviously, an equal allocation is impossible because the second number does not evenly divide the first, but you can get close by assigning 34 patients to one group and 33 to the others. Mathematically,

Read More

Learn SAS | Programming Tips

Rick WicklinApril 1, 2019 0

Matrix operations and BY groups

Many SAS procedures support the BY statement, which enables you to perform an analysis for subgroups of the data set. Although the SAS/IML language does not have a built-in "BY statement," there are various techniques that enable you to perform a BY-group analysis. The two I use most often are

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinFebruary 13, 2019 0

3 ways to obtain the Hessian at the MLE solution for a regression model

When you use maximum likelihood estimation (MLE) to find the parameter estimates in a generalized linear regression model, the Hessian matrix at the optimal solution is very important. The Hessian matrix indicates the local shape of the log-likelihood surface near the optimal value. You can use the Hessian to estimate

Read More

Analytics | Learn SAS

Rick WicklinFebruary 11, 2019 0

4 reasons to use PROC PLM for linear regression models in SAS

Have you ever run a regression model in SAS but later realize that you forgot to specify an important option or run some statistical test? Or maybe you intended to generate a graph that visualizes the model, but you forgot? Years ago, your only option was to modify your program

Read More

Learn SAS | Programming Tips

Parameter estimates for synthetic (simulated) data that follows a regression model.

Rick WicklinJanuary 28, 2019 0

Simulate data for a regression model with categorical and continuous variables

This article shows how to use SAS to simulate data that fits a linear regression model that has categorical regressors (also called explanatory or CLASS variables). Simulating data is a useful skill for both researchers and statistical programmers. You can use simulation for answering research questions, but you can also

Read More

Analytics | Programming Tips

Rick WicklinJanuary 23, 2019 0

Coding and simulating categorical variables in regression models

Recently I was asked to explain the result of an ANOVA analysis that I posted to a statistical discussion forum. My program included some simulated data for an ANOVA model and a call to the GLM procedure to estimate the parameters. I was asked why the parameter estimates from PROC

Read More

Analytics | Programming Tips

Rick WicklinJanuary 23, 2019 0

Coding and simulating categorical variables in regression models

Recently I was asked to explain the result of an ANOVA analysis that I posted to a statistical discussion forum. My program included some simulated data for an ANOVA model and a call to the GLM procedure to estimate the parameters. I was asked why the parameter estimates from PROC

Read More

Analytics | Machine Learning | Programming Tips

Partition data into training, validation, and testing in SAS

Rick WicklinJanuary 21, 2019 0

Create training, validation, and test data sets in SAS

In machine learning and other model building techniques, it is common to partition a large data set into three segments: training, validation, and testing. Training data is used to fit each model. Validation data is a random sample that is used for model selection. These data are used to select

Read More

Analytics | Programming Tips

Rick WicklinJanuary 16, 2019 0

Three ways to add a line to a Q-Q plot

A quantile-quantile plot (Q-Q plot) is a graphical tool that compares a data distribution and a specified probability distribution. If the points in a Q-Q plot appear to fall on a straight line, that is evidence that the data can be approximately modeled by the target distribution. Although it is

Read More

Analytics | Data Visualization | Programming Tips

Process flow diagram shows how to resample data to create a bootstrap distribution.

Rick WicklinJanuary 9, 2019 0

10 posts from 2018 that deserve a second look

Numbers don't lie, but sometimes they don't reveal the full story. Last week I wrote about the most popular articles from The DO Loop in 2018. The popular articles are inevitably about elementary topics in SAS programming or statistics because those topics have broad appeal. However, I also write about

Read More

Learn SAS | Programming Tips

Rick WicklinDecember 5, 2018 0

When is a histogram not a histogram? When it's a table!

Recently a SAS programmer wanted to obtain a table of counts that was based on a histogram. I showed him how you can use the OUTHIST= option on the HISTOGRAM statement in PROC UNIVARIATE to obtain that information. For example, the following call to PROC UNIVARIATE creates a histogram for

Read More

Analytics | Programming Tips

Rick WicklinOctober 3, 2018 0

Fast simulation of multivariate normal data with an AR(1) correlation structure

It is sometimes necessary for researchers to simulate data with thousands of variables. It is easy to simulate thousands of uncorrelated variables, but more difficult to simulate thousands of correlated variables. For that, you can generate a correlation matrix that has special properties, such as a Toeplitz matrix or a

Read More

Programming Tips

Rick WicklinSeptember 26, 2018 0

Radial basis functions and Gaussian kernels in SAS

A radial basis function is a scalar function that depends on the distance to some point, called the center point, c. One popular radial basis function is the Gaussian kernel φ(x; c) = exp(-||x – c||2 / (2 σ2)), which uses the squared distance from a vector x to the

Read More

Analytics | Data Visualization

Rick WicklinSeptember 19, 2018 0

Shuffling smackdown: Overhand shuffle versus riffle shuffle

Every day I’m shufflin'. Shufflin', shufflin'. -- "Party Rock Anthem," LMFAO The most popular way to mix a deck of cards is the riffle shuffle, which separates the deck into two pieces and interleaves the cards from each piece. Besides being popular with card players, the riffle shuffle is

Read More

Analytics

Rick WicklinSeptember 12, 2018 0

Two interfaces for typing text by using a TV remote control

Have you ever tried to type a movie title by using a TV remote control? Both Netflix and Amazon Video provide an interface (a virtual keyboard) that enables you to use the four arrow keys of a standard remote control to type letters. The letters are arranged in a regular

Read More

Programming Tips

Visualization of L1 distance matrix for items arranged on a 6 x 6 grid

Rick WicklinSeptember 10, 2018 0

Distances on rectangular grids

Given a rectangular grid with unit spacing, what is the expected distance between two random vertices, where distance is measured in the L1 metric? (Here "random" means "uniformly at random.") I recently needed this answer for some small grids, such as the one to the right, which is a 7 x 6

Read More

Programming Tips

Rick WicklinSeptember 4, 2018 0

Store vectors of different lengths in a matrix

In the SAS/IML language, you can only concatenate vectors that have conforming dimensions. For example, to horizontally concatenate two vectors X and Y, the symbols X and Y must have the same number of rows. If not, the statement Z = X || Y will produce an error: ERROR: Matrices

Read More

Analytics

Rick WicklinAugust 29, 2018 0

Kernel regression in SAS

A SAS programmer recently asked me how to compute a kernel regression in SAS. He had read my blog posts "What is loess regression" and "Loess regression in SAS/IML" and was trying to implement a kernel regression in SAS/IML as part of a larger analysis. This article explains how to

Read More

Learn SAS | Programming Tips

Rick WicklinAugust 22, 2018 0

Standardized regression coefficients

A SAS programmer recently asked how to interpret the "standardized regression coefficients" as computed by the STB option on the MODEL statement in PROC REG and other SAS regression procedures. The SAS documentation for the STB option states, "a standardized regression coefficient is computed by dividing a parameter estimate by

Read More

Programming Tips

Rick WicklinAugust 20, 2018 0

Calculators killed the standard statistical table

Video killed the radio star.... We can't rewind, we've gone too far. -- The Buggles (1979) "You kids have it easy," my father used to tell me. "When I was a kid, I didn't have all the conveniences you have today." He's right, and I could say the same

Read More

Data Visualization | Programming Tips

Rick WicklinAugust 8, 2018 0

Plot curves for levels of two categorical variables in SAS

The SGPLOT procedure in SAS makes it easy to create graphs that overlay various groups in the data. Many statements support the GROUP= option, which specifies that the graph should overlay group information. For example, you can create side-by-side bar charts and box plots, and you can overlay multiple scatter

Read More

Analytics | Programming Tips

Rick WicklinAugust 6, 2018 0

How to score and graph a quantile regression model in SAS

This article shows how to score (evaluate) a quantile regression model on new data. SAS supports several procedures for quantile regression, including the QUANTREG, QUANTSELECT, and HPQUANTSELECT procedures. The first two procedures do not support any of the modern methods for scoring regression models, so you must use the "missing

Read More

Analytics | Learn SAS

Rick WicklinAugust 1, 2018 0

Which variables are in the final selected model?

When you use a regression procedure in SAS that supports variable selection (GLMSELECT or QUANTSELECT), did you know that the procedures automatically produce a macro variable that contains the names of the selected variables? This article provides examples and details. A previous article provides an overview of the 'SELECT' procedures

Read More

Analytics | Programming Tips

Rick WicklinJune 27, 2018 0

Reduced models: A way to choose initial parameters for a mixed model

This article describes how to obtain an initial guess for nonlinear regression models, especially nonlinear mixed models. The technique is to first fit a simpler fixed-effects model by replacing the random effects with their expected values. The parameter estimates for the fixed-effects model are often good initial guesses for the

Read More

Analytics

Rick WicklinJune 25, 2018 0

Use a grid search to find initial parameter values for regression models in SAS

When you fit nonlinear fixed-effect or mixed models, it is difficult to guess the model parameters that fit the data. Yet, most nonlinear regression procedures (such as PROC NLIN and PROC NLMIXED in SAS) require that you provide a good guess! If your guess is not good, the fitting algorithm,

Read More

Analytics | Programming Tips

Rick WicklinJune 20, 2018 0

The bootstrap method in SAS: A t test example

A previous article provides an example of using the BOOTSTRAP statement in PROC TTEST to compute bootstrap estimates of statistics in a two-sample t test. The BOOTSTRAP statement is new in SAS/STAT 14.3 (SAS 9.4M5). However, you can perform the same bootstrap analysis in earlier releases of SAS by using

Read More

Learn SAS | Programming Tips

Rick WicklinJune 8, 2018 0

Video: A new syntax for lists in SAS/IML

I recently recorded a short video about the new syntax for specifying and manipulating lists in SAS/IML 14.3. This is a video of my Super Demo at SAS Global Forum 2018. The new syntax supports dynamic arrays, associative arrays ("named lists"), and hierarchical data structures such as lists of lists.

Read More

Previous 1 … 4 5 6 7 8 … 15 Next