Tag: Statistical Programming

Rick Wicklin 0
Evaluate polynomials efficiently by using Horner's scheme

Polynomials are used often in data analysis. Low-order polynomials are used in regression to model the relationship between variables. Polynomials are used in numerical analysis for numerical integration and Taylor series approximations. It is therefore important to be able to evaluate polynomials in an efficient manner. My favorite evaluation technique

Rick Wicklin 0
Storing and loading modules

You can extend the capability of the SAS/IML language by writing modules. A module is a user-defined function. You can define a module by using the START and FINISH statements. Many people, including myself, define modules at the top of the SAS/IML program in which they are used. You can

Rick Wicklin 0
The most likely birthday in the US

Do you know someone who has a birthday in mid-September? Odds are that you do: the middle of September is when most US babies are born, according to data obtained from the National Center for Health Statistics (NCHS) Web site (see Table 1-16). There's an easy way to remember this

Programming Tips
Rick Wicklin 0
Loops in SAS

Looping is essential to statistical programming. Whether you need to iterate over parameters in an algorithm or indices in an array, a loop is often one of the first programming constructs that a beginning programmer learns. Today is the first anniversary of this blog, which is named The DO Loop,

Rick Wicklin 0
Multithreaded = more productive

NOTE: SAS stopped shipping the SAS/IML Studio interface in 2018. It is no longer supported, so this article is no longer relevant. When I write SAS/IML programs, I usually do my development in the SAS/IML Studio environment. Why? There are many reasons, but the one that I will discuss today

Rick Wicklin 0
The area under a density estimate curve

Readers' comments indicate that my previous blog article about computing the area under an ROC curve was helpful. Great! There is another common application of numerical integration: finding the area under a density estimation curve. This article provides an overview of density estimation and computes an empirical cumulative density function.

Rick Wicklin 0
Pre-allocate arrays to improve efficiency

Recently Charlie Huang showed how to use the SAS/IML language to compute an exponentially weighted moving average of some financial data. In the commentary to his analysis, he said: I found that if a matrix or a vector is declared with specified size before the computation step, the program’s efficiency

Rick Wicklin 0
Enumerating levels of a classification variable

A colleague asked, "How can I enumerate the levels of a categorical classification variable in SAS/IML software?" The variable was a character variable with n observations, but he wanted the following: A "look-up table" that contains the k (unique) levels of the variable. A vector with n elements that contains

Rick Wicklin 0
Inadequate finishes

Andrew Ratcliffe posted a fine article titled "Inadequate Mends" in which he extols the benefits of including the name of a macro on the %MEND statement. That is, if you create a macro function named foo, he recommends that you include the name in two places: %macro foo(x); /** define

Rick Wicklin 0
Finding data that satisfy a criterion

A fundamental operation in data analysis is finding data that satisfy some criterion. How many people are older than 85? What are the phone numbers of the voters who are registered Democrats? These questions are examples of locating data with certain properties or characteristics. The SAS DATA step has a

Rick Wicklin 0
Funnel plots: An alternative to ranking

In a previous blog post, I showed how you can use simulation to construct confidence intervals for ranks. This idea (from a paper by E. Marshall and D. Spiegelhalter), enables you to display a graph that compares the performance of several institutions, where "institutions" can mean schools, companies, airlines, or

Rick Wicklin 0
A statistical model of card shuffling

I recently returned from a five-day conference in Las Vegas. On the way there, I finally had time to read a classic statistical paper: Bayer and Diaconis (1992) describes how many shuffles are needed to randomize a deck of cards. Their famous result that it takes seven shuffles to randomize

Rick Wicklin 0
How to sample from independent normal distributions

In my article on computing confidence intervals for rankings, I had to generate p random vectors that each contained N random numbers. Each vector was generated from normal distribution with different parameters. This post compares two different ways to generate p vectors that are sampled from independent normal distributions. Sampling

Advanced Analytics
Rick Wicklin 0
Ranking with confidence: Part 1

I recently posted an article about representing uncertainty in rankings on the blog of the ASA Section for Statistical Programmers and Analysts (SSPA). The posting discusses the importance of including confidence intervals or other indicators of uncertainty when you display rankings. Today's article complements the SSPA post by showing how

1 11 12 13 14 15