Blogs

Blogs

Tag: Statistical Programming

Programming Tips

Rick WicklinApril 16, 2018 0

Random permutations without duplicates

A colleague and I recently discussed how to generate random permutations without encountering duplicates. Given a set of n items, there are n! permutations My colleague wants to generate k unique permutations at random from among the total of n!. Said differently, he wants to sample without replacement from the

Read More

Programming Tips

Rick WicklinApril 4, 2018 0

Distance correlation

Correlation is a statistic that measures how closely two variables are related to each other. The most popular definition of correlation is the Pearson product-moment correlation, which is a measurement of the linear relationship between two variables. Many textbooks stress the linear nature of the Pearson correlation and emphasize that

Read More

Analytics | Data Visualization | Programming Tips

Euclidean and L1 distances between observations and a target value for standardized data

Rick WicklinMarch 28, 2018 0

Find the distances between observations and a target value

Suppose you want to find observations in multivariate data that are closest to a numerical target value. For example, for the students in the Sashelp.Class data set, you might want to find the students whose (Age, Height, Weight) values are closest to the triplet (13, 62, 100). The way to

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 19, 2018 0

Compute with combinations: Maximize a function over combinations of variables

About once a month I see a question on the SAS Support Communities that involves what I like to call "computations with combinations." A typical question asks how to find k values (from a set of p values) that maximize or minimize some function, such as "I have 5 variables,

Read More

Analytics | Programming Tips

Rick WicklinMarch 7, 2018 0

Fit a distribution from quantiles

Data analysts often fit a probability distribution to data. When you have access to the data, a common technique is to use maximum likelihood estimation (MLE) to compute the parameters of a distribution that are "most likely" to have produced the observed data. However, how can you fit a distribution

Read More

Advanced Analytics

Sample from mixture distribution showing sample median

Rick WicklinFebruary 21, 2018 0

A Monte Carlo algorithm to estimate a median

This article describes and implements a fast algorithm that estimates a median for very large samples. The traditional median estimate sorts a sample of size N and returns the middle value (when N is odd). The algorithm in this article uses Monte Carlo techniques to estimate the median much faster.

Read More

Analytics | Programming Tips

Quantiles are the solutions to the equation CDF(x)-p=0, where p is a probability

Rick WicklinFebruary 19, 2018 0

Compute the quantiles of any distribution

Your statistical software probably provides a function that computes quantiles of common probability distributions such as the normal, exponential, and beta distributions. Because there are infinitely many probability distributions, you might encounter a distribution for which a built-in quantile function is not implemented. No problem! This article shows how to

Read More

Programming Tips

Rick WicklinJanuary 24, 2018 0

Use lists to pass parameters to SAS/IML functions

A popular way to use lists in the SAS/IML language is to pack together several related matrices into a single data structure that can be passed to a function. Imagine that you have written an algorithm that requires a dozen different parameters. Historically, you would have to pass those parameters

Read More

Programming Tips

Rick WicklinJanuary 22, 2018 0

Create lists by using a natural syntax in SAS/IML

SAS/IML 14.3 (SAS 9.4M5) introduced a new syntax for creating lists and for assigning and extracting item in a list. Lists (introduced in SAS/IML 14.2) are data structures that are convenient for holding heterogeneous data. A single list can hold character matrices, numeric matrices, scalar values, and other lists, as

Read More

Learn SAS | Programming Tips

Rick WicklinJanuary 10, 2018 0

10 posts from 2017 that deserve a second look

Last week I wrote about the 10 most popular articles from The DO Loop in 2017. My most popular articles tend to be about elementary statistics or SAS programming tips. Less popular are the articles about advanced statistical and programming techniques. However, these technical articles fill an important niche. Not

Read More

Programming Tips

Histogram of data overlaid with a beta density curve, fitted by maximum likelihood estimation

Rick WicklinNovember 27, 2017 0

The method of moments: A smart way to choose initial parameters for MLE

When you run an optimization, it is often not clear how to provide the optimization algorithm with an initial guess for the parameters. A good guess converges quickly to the optimal solution whereas a bad guess might diverge or require many iterations to converge. Many people use a default value

Read More

Programming Tips

Beta-binomial cumulative distribution

Rick WicklinNovember 22, 2017 0

Compute the CDF and quantiles of discrete distributions

A statistical programmer read my article about the beta-binomial distribution and wanted to know how to compute the cumulative distribution (CDF) and the quantile function for this distribution. In general, if you know the PDF for a discrete distribution, you can also compute the CDF and quantile functions. This article

Read More

Learn SAS | Programming Tips

Rick WicklinNovember 15, 2017 0

Catch run-time errors in SAS/IML programs

Did you know that a SAS/IML function can recover from a run-time error? You can specify how to handle run-time errors by using a programming technique that is similar to the modern "try-catch" technique, although the SAS/IML technique is an older implementation. Preventing errors versus handling errors In general, SAS/IML

Read More

Programming Tips

The PAUSE statement as a debugging tool in SAS/IML Studio

Rick WicklinNovember 13, 2017 0

A tip for debugging SAS/IML modules: The PAUSE statement

Debugging is the bane of every programmer. SAS supports a DATA step debugger, but that debugger can't be used for debugging SAS/IML programs. In lieu of a formal debugger, many SAS/IML programmers resort to inserting multiple PRINT statements into a function definition. However, there is an easier way to query

Read More

Analytics | Learn SAS

Principal component regression in SAS: Loadings plot

Rick WicklinOctober 23, 2017 0

Principal component regression in SAS

A common question on discussion forums is how to compute a principal component regression in SAS. One reason people give for wanting to run a principal component regression is that the explanatory variables in the model are highly correlated which each other, a condition known as multicollinearity. Although principal component

Read More

Data Visualization | Learn SAS

Rick WicklinSeptember 18, 2017 0

The path of zip codes

Toe bone connected to the foot bone, Foot bone connected to the leg bone, Leg bone connected to the knee bone,... — American Spiritual, "Dem Bones" Last week I read an interesting article on Robert Kosara's data visualization blog. Kosara connected the geographic centers of the US zip codes in

Read More

Analytics | Data Visualization

Bar chart of pairwise correlations between variables

Rick WicklinAugust 16, 2017 0

Use a bar chart to visualize pairwise correlations

Visualizing the correlations between variables often provides insight into the relationships between variables. I've previously written about how to use a heat map to visualize a correlation matrix in SAS/IML, and Chris Hemedinger showed how to use Base SAS to visualize correlations between variables. Recently a SAS programmer asked how

Read More

Advanced Analytics

Classical and robust principal component scores for crime data, computed in SAS

Rick WicklinAugust 9, 2017 0

Robust principal component analysis in SAS

Recently, I was asked whether SAS can perform a principal component analysis (PCA) that is robust to the presence of outliers in the data. A PCA requires a data matrix, an estimate for the center of the data, and an estimate for the variance/covariance of the variables. Classically, these estimates

Read More

Analytics | Programming Tips

Rick WicklinJuly 5, 2017 0

Test for the equality of two proportions in SAS

A SAS customer asked how to use SAS to conduct a Z test for the equality of two proportions. He was directed to the SAS Usage Note "Testing the equality of two or more proportions from independent samples." The note says to "specify the CHISQ option in the TABLES statement

Read More

Programming Tips

Rick WicklinJune 26, 2017 0

Video: Create and use lists and tables in SAS/IML

My presentation at SAS Global Forum 2017 was "More Than Matrices: SAS/IML Software Supports New Data Structures." The paper was published in the conference proceedings several months ago, but I recently recorded a short video that gives an overview of using the new data structures in SAS/IML 14.2: If your

Read More

Analytics | Programming Tips

Rick WicklinJune 14, 2017 0

Two ways to compute maximum likelihood estimates in SAS

In a previous article, I showed two ways to define a log-likelihood function in SAS. This article shows two ways to compute maximum likelihood estimates (MLEs) in SAS: the nonlinear optimization subroutines in SAS/IML and the NLMIXED procedure in SAS/STAT. To illustrate these methods, I will use the same data

Read More

Analytics | Programming Tips

Optimal value of quadratic function of two variables

Rick WicklinApril 12, 2017 0

Quadratic optimization in SAS

At SAS Global Forum last week, I saw a poster that used SAS/IML to optimized a quadratic objective function that arises in financial portfolio management (Xia, Eberhardt, and Kastin, 2017). The authors used the Newton-Raphson optimizer (NLPNRA routine) in SAS/IML to optimize a hypothetical portfolio of assets. The Newton-Raphson algorithm

Read More

Programming Tips

Illustration of the 68-95-99.7 rule

Rick WicklinApril 10, 2017 0

A simple trick to construct symmetric intervals

Many intervals in statistics have the form p ± δ, where p is a point estimate and δ is the radius (or half-width) of the interval. (For example, many two-sided confidence intervals have this form, where δ is proportional to the standard error.) Many years ago I wrote an article

Read More

Programming Tips

Rick WicklinApril 5, 2017 0

Piecewise regression models and spline effects

Most regression models try to model a response variable by using a smooth function of the explanatory variables. However, if the data are generated from some nonsmooth process, then it makes sense to use a regression function that is not smooth. A simple way to model a discontinuous process in

Read More

Programming Tips

Rick WicklinApril 3, 2017 0

Print tables in SAS/IML

One of the advantages of the new mixed-type tables in SAS/IML 14.2 (released with SAS 9.4m4) is the greatly enhanced printing functionality. You can control which rows and columns are printed, specify formats for individual columns, and even use templates to completely customize how tables are printed. Printing a table

Read More

Programming Tips

SAS/IML lists can store objects of different shapes and sizes

Rick WicklinMarch 29, 2017 0

Lists: Nonmatrix data structures in SAS/IML

Lists are collections of objects. SAS/IML 14.2 supports lists as a way to store matrices, data tables, and other lists in a single object that you can pass to functions. SAS/IML lists automatically grow if you add new items to them and shrink if you remove items. You can also

Read More

Programming Tips

Rick WicklinMarch 22, 2017 0

Data tables: Nonmatrix data structures in SAS/IML

Prior to SAS/IML 14.2, every variable in the Interactive Matrix Language (IML) represented a matrix. That changed when SAS/IML 14.2 (released with SAS 9.4m4) introduced two new data structures: data tables and lists. This article gives an overview of data tables. I will blog about lists in a separate article.

Read More

Advanced Analytics

Rick WicklinFebruary 15, 2017 0

Simultaneous confidence intervals for multinomial proportions

A categorical response variable can take on k different values. If you have a random sample from a multinomial response, the sample proportions estimate the proportion of each category in the population. This article describes how to construct simultaneous confidence intervals for the proportions as described in the 1997 paper

Read More

Advanced Analytics | Learn SAS

Rick WicklinFebruary 13, 2017 0

An easy way to run thousands of regressions in SAS

A common question on SAS discussion forums is how to repeat an analysis multiple times. Most programmers know that the most efficient way to analyze one model across many subsets of the data (perhaps each country or each state) is to sort the data and use a BY statement to

Read More

Advanced Analytics

Rick WicklinFebruary 8, 2017 0

Winsorization: The good, the bad, and the ugly

On discussion forums, I often see questions that ask how to Winsorize variables in SAS. For example, here are some typical questions from the SAS Support Community: I want an efficient way of replacing (upper) extreme values with (95th) percentile. I have a data set with around 600 variables and

Read More

Previous 1 … 5 6 7 8 9 … 15 Next