Blogs

Blogs

Author

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Advanced Analytics

Rick WicklinJuly 5, 2012 0

Compute the multivariate normal density in SAS

I've been working on a new book about Simulating Data with SAS. In researching the chapter on simulation of multivariate data, I've noticed that the probability density function (PDF) of multivariate distributions is often specified in a matrix form. Consequently, the multivariate density can usually be computed by using the

Read More

Rick WicklinJuly 2, 2012 0

Create a contour plot in SAS

When I need to graph a function of two variables, I often choose to use a contour plot. A surface plot is probably easier for many people to understand, but it has several disadvantages when compared to a contour plot. For example, the following statements in SAS/IML Studio displays a

Read More

Rick WicklinJune 29, 2012 0

Is using zero as a random number seed the same as not specifying a seed?

I received the following query regarding the RAND function in Base SAS: In SAS, is specifying 0 as a random number seed the same as not specifying a seed at all? The question concerns initializing the SAS random number stream by using the internal system clock. You can do this

Read More

Advanced Analytics

Rick WicklinJune 27, 2012 0

Create an ID vector for repeated measurements

I often use the SAS/IML language for simulating data with certain known properties. In fact, I'm writing a book called Simulating Data with SAS. When I simulate repeated measurements (sometimes called replicated data), I often want to generate an ID variable that identifies which measurement is associated with which subject

Read More

Rick WicklinJune 25, 2012 0

Programming tip: Avoid testing floating-point values for equality

No matter what statistical programming language you use, be careful of testing for an exact value of a floating-point number. This is known in the world of numerical analysis as "10.0 times 0.1 is hardly ever 1.0" (Kernighan and Plauger, 1974, The Elements of Programming Style). There are many examples

Read More

Advanced Analytics

Rick WicklinJune 22, 2012 0

Compute a running total for a "window" of time

A reader wrote for help with a computational problem. He has a vector of length N and the vector contains integer values in the range [1, 120], which represent months for which events occurred over a 10-year period. The question is: what is the 24-month period for which the most

Read More

Rick WicklinJune 21, 2012 0

A statistician reads the newspaper: Forecasting rising sea levels

This is a third post on newspaper stories that I recently read. Today's post deals with science, politics, and rising sea levels. Incidentally, the title is a blatant reference to John Allen Paulos's brilliant book, A Mathematician Reads the Newspaper. Senate approves law that challenges sea-level science The NC legislature

Read More

Rick WicklinJune 20, 2012 0

A statistician reads the newspaper: Academic fraud

This is my second post on some newspaper articles that I recently read. Today's post deals with academic fraud. Questions linger in academic fraud case Over the past year, the News and Observer has occasionally reported on a scandal at the University of North Carolina at Chapel Hill in which

Read More

Rick WicklinJune 19, 2012 0

A statistician reads the newspaper: The Secret Service scandal

This past weekend was Father's Day, so I took some time to relax and read the newspaper. I found several stories that suggested interesting statistical questions. Unfortunately, the data are not available for analysis. Nevertheless, the stories are worth sharing. Over the next few days, I'll post my thoughts on

Read More

Rick WicklinJune 18, 2012 0

A statistically beautiful Father's Day

To celebrate special occasions like Father's Day, I like to relax with a cup of coffee and read the newspaper. When I looked at the weather page, I was astonished by the seeming uniformity of temperatures across the contiguous US. The weather map in my newspaper was almost entirely yellow

Read More

Rick WicklinJune 13, 2012 0

Convergence or divergence? A simple iteration with a random component

A collegue who works with time series sent me the following code snippet. He said that the calculation was overflowing and wanted to know if this was a bug in SAS: data A(drop=m); call streaminit(12345); m = 2; x = 0; do i = 1 to 5000; x = m*x

Read More

Programming Tips

Rick WicklinJune 6, 2012 0

Eight tips to make your simulation run faster

"Help! My simulation is taking too long to run! How can I make it go faster?" I frequently talk with statistical programmers who claim that their "simulations are too slow" (by which they mean, "they take too long"). They suspect that their program is inefficient, but they aren't sure why.

Read More

Rick WicklinJune 4, 2012 0

Rename many variables that have numerical suffixes and a common prefix

I recently read a blog post in which a SAS user had to rename a bunch of variables named A1, A2,..., A10, such as are contained in the following data set: /* generate data with variables A1-A10 */ data A; array A[10] A1-A10 (1); do i = 1 to 10;

Read More

Rick WicklinMay 31, 2012 0

An easy way to define a library of user-defined functions

In the SAS/IML language, a user-defined function or subroutine is called a module. Modules are used to extend the capability of the SAS/IML language. In my blog posts, I usually define a module in a PROC IML session and then immediately use it. However, sometimes it is useful to store

Read More

Advanced Analytics

Rick WicklinMay 29, 2012 0

Did you know that PROC IML automatically loads certain modules?

In the SAS/IML language, a user-defined function or subroutine is called a module. Modules are used to extend the capability of the SAS/IML language. Usually you need to explicitly load modules before you use them, but there are two cases where PROC IML loads a module automatically. Modules in IMLMLIB

Read More

Advanced Analytics

Rick WicklinMay 23, 2012 0

Compute statistics for each row by using subscript operators

In a previous blog, I showed how to use SAS/IML subscript reduction operators to compute the location of the maximum values for each row of a matrix. The subscript reduction operators are useful for computing simple statistics for each row (or column) of a numerical matrix. If x is a

Read More

Rick WicklinMay 21, 2012 0

For each observation, find the variable that contains the minimum value

The other day I encountered an article in the SAS Knowledge Base that shows how to write a macro that "returns the variable name that contains the maximum or minimum value across an observation." Some people might say that the macro is "clever." I say it is complicated. This is

Read More

Rick WicklinMay 16, 2012 0

The curious case of random eigenvalues

I've been a fan of statistical simulation and other kinds of computer experimentation for many years. For me, simulation is a good way to understand how the world of statistics works, and to formulate and test conjectures. Last week, while investigating the efficiency of the power method for finding dominant

Read More

Rick WicklinMay 14, 2012 0

How to read data set variables into SAS/IML vectors

One of the first skills that a beginning SAS/IML programmer learns is how to read data from a SAS data set into SAS/IML vectors. (Alternatively, you can read data into a matrix). The beginner is sometimes confused about the syntax of the READ statement: do you specify the names of

Read More

Advanced Analytics

Rick WicklinMay 9, 2012 0

The power method: compute only the largest eigenvalue of a matrix

When I was at SAS Global Forum last week, a SAS user asked my advice regarding a SAS/IML program that he wrote. One step of the program was taking too long to run and he wondered if I could suggest a way to speed it up. The long-running step was

Read More

Advanced Analytics

Rick WicklinMay 7, 2012 0

Checking your answers: Are computed values close to the true values?

In statistical programming, I often test a program by running it on a problem for which I know the correct answer. I often use a single expression to compute the maximum value of the absolute difference between the vectors: maxDiff = max( abs( z-correct ) ); /* largest absolute difference

Read More

Advanced Analytics

Rick WicklinMay 4, 2012 0

Expand data by using frequencies

A reader asked: I want to create a vector as follows. Suppose there are two given vectors x=[A B C] and f=[1 2 3]. Here f indicates the frequency vector. I hope to generate a vector c=[A B B C C C]. I am trying to use the REPEAT function

Read More

Advanced Analytics

Rick WicklinMay 2, 2012 0

The DIF function: Compute lagged differences and finite differences

To a statistician, the DIF function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function has many other uses, including computing finite differences. The DIF function computes the difference between the original vector and a shifted version

Read More

Advanced Analytics

Rick WicklinApril 30, 2012 0

The LAG function: Useful for more than time series analysis

To a statistician, the LAG function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function provides a convenient way to compute quantitites that involve adjacent values in any vector. The LAG function is essentially a "shift operator."

Read More

Advanced Analytics

Rick WicklinApril 20, 2012 0

Popular! Articles that strike a chord with SAS users

I blog about a lot of topics, but the following five categories represent some of my favorite subjects. Judging by the number of readers and comments, these articles have struck a chord with SAS users. If you haven't read them, check them out. (If you HAVE read them, some are

Read More

Rick WicklinApril 18, 2012 0

Extending SAS: How to define new functions in PROC FCMP and SAS/IML software

SAS software provides many run-time functions that you can call from your SAS/IML or DATA step programs. The SAS/IML language has several hundred built-in statistical functions, and Base SAS software contains hundreds more. However, it is common for statistical programmers to extend the run-time library to include special user-defined functions.

Read More

Rick WicklinApril 16, 2012 0

BY-group processing in SAS/IML

Because the SAS/IML language is a general purpose programming language, it doesn't have a BY statement like most other SAS procedures (such as PROC REG). However, there are several ways to loop over categorical variables and perform an analysis on the observations in each category. One way is to use

Read More

Rick WicklinApril 12, 2012 0

The Poissonness plot: A goodness-of-fit diagnostic

Last week I discussed how to fit a Poisson distribution to data. The technique, which involves using the GENMOD procedure, produces a table of some goodness-of-fit statistics, but I find it useful to also produce a graph that indicates the goodness of fit. For continuous distributions, the quantile-quantile (Q-Q) plot

Read More

Rick WicklinApril 10, 2012 0

A singular spectrum analysis of a temperature time series

Last week I blogged about how to construct a smoother for a time series for the temperature in Albany, NY from 1995 to March, 2012. I smoothed the data by "folding" the time series into a single "year" that contains repeated measurements for each day of the year. Experts in

Read More

Rick WicklinApril 9, 2012 0

Vectorized computations and the birthday matching problem

The birthday matching problem is a classic problem in probability theory. The part of it that people tend to remember is that in a room of 23 people, there is greater than 50% chance that two people in the room share a birthday. But the birthday matching problem is also

Read More

Previous 1 … 39 40 41 42 43 … 50 Next