Tag: Statistical Programming

Rick Wicklin 0
The UNIQUE-LOC trick: A real treat!

When you analyze data, you will occasionally have to deal with categorical variables. The typical situation is that you want to repeat an analysis or computation for each level (category) of a categorical variable. For example, you might want to analyze males separately from females. Unlike most other SAS procedures,

Rick Wicklin 0
Video: Calling R from the SAS/IML Language

In SAS/IML 9.22 and beyond, you can call the R statistical programming language from within a SAS/IML program. The syntax is similar to the syntax for calling SAS from SAS/IML: You use a SUBMIT statement, but add the R option: SUBMIT / R. All statements in the program between the

Rick Wicklin 0
Four essential functions for statistical programmers

Normal, Poisson, exponential—these and other "named" distributions are used daily by statisticians for modeling and analysis. There are four operations that are used often when you work with statistical distributions. In SAS software, the operations are available by using the following four functions, which are essential for every statistical programmer

Rick Wicklin 0
Optimizing? Two hints for specifying derivatives

I previously wrote about using SAS/IML for nonlinear optimization, and demonstrated optimization by maximizing a likelihood function. Many well-known optimization algorithms require derivative information during the optimization, including the conjugate gradient method (implemented in the NLPCG subroutine) and the Newton-Raphson method (implemented in the NLPNRA method). You should specify analytic

Rick Wicklin 0
Maximum likelihood estimation in SAS/IML

A popular use of SAS/IML software is to optimize functions of several variables. One statistical application of optimization is estimating parameters that optimize the maximum likelihood function. This post gives a simple example for maximum likelihood estimation (MLE): fitting a parametric density estimate to data. Which density curve fits the

Rick Wicklin 0
Distances between words

When you misspell a word on your mobile device or in a word-processing program, the software might "autocorrect" your mistake. This can lead to some funny mistakes, such as the following: I hate Twitter's autocorrect, although changing "extreme couponing" to "extreme coupling" did make THAT tweet more interesting. [@AnnMariaStat] When

Rick Wicklin 0
A math puzzle solution

I previously wrote about an intriguing math puzzle that involves 5-digit numbers with certain properties. This post presents my solution in the SAS/IML language. It is easy to generate all 5-digit perfect squares, but the remainder of the problem involves looking at the digits of the squares. For this reason,

Rick Wicklin 0
Evaluate polynomials efficiently by using Horner's scheme

Polynomials are used often in data analysis. Low-order polynomials are used in regression to model the relationship between variables. Polynomials are used in numerical analysis for numerical integration and Taylor series approximations. It is therefore important to be able to evaluate polynomials in an efficient manner. My favorite evaluation technique

Rick Wicklin 0
Storing and loading modules

You can extend the capability of the SAS/IML language by writing modules. A module is a user-defined function. You can define a module by using the START and FINISH statements. Many people, including myself, define modules at the top of the SAS/IML program in which they are used. You can

Rick Wicklin 0
The most likely birthday in the US

Do you know someone who has a birthday in mid-September? Odds are that you do: the middle of September is when most US babies are born, according to data obtained from the National Center for Health Statistics (NCHS) Web site (see Table 1-16). There's an easy way to remember this

Programming Tips
Rick Wicklin 0
Loops in SAS

Looping is essential to statistical programming. Whether you need to iterate over parameters in an algorithm or indices in an array, a loop is often one of the first programming constructs that a beginning programmer learns. Today is the first anniversary of this blog, which is named The DO Loop,

Rick Wicklin 0
Multithreaded = more productive

NOTE: SAS stopped shipping the SAS/IML Studio interface in 2018. It is no longer supported, so this article is no longer relevant. When I write SAS/IML programs, I usually do my development in the SAS/IML Studio environment. Why? There are many reasons, but the one that I will discuss today

Rick Wicklin 0
The area under a density estimate curve

Readers' comments indicate that my previous blog article about computing the area under an ROC curve was helpful. Great! There is another common application of numerical integration: finding the area under a density estimation curve. This article provides an overview of density estimation and computes an empirical cumulative density function.

Rick Wicklin 0
Pre-allocate arrays to improve efficiency

Recently Charlie Huang showed how to use the SAS/IML language to compute an exponentially weighted moving average of some financial data. In the commentary to his analysis, he said: I found that if a matrix or a vector is declared with specified size before the computation step, the program’s efficiency

Rick Wicklin 0
Enumerating levels of a classification variable

A colleague asked, "How can I enumerate the levels of a categorical classification variable in SAS/IML software?" The variable was a character variable with n observations, but he wanted the following: A "look-up table" that contains the k (unique) levels of the variable. A vector with n elements that contains

1 11 12 13 14 15