Blogs

Blogs

Author

Rick Wicklin

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Rick WicklinJuly 18, 2011 0

An easy way to specify dates and times

Dates and times. As Wayne Finley states in his SUGI25 paper on SAS date and time handling, "The SAS system provides a plethora of methods to handle date and time values." Along with the plethora of methods is a plethora of papers on the topic. If you want to trick

Read More

Rick WicklinJuly 15, 2011 0

Five new features of SAS 9.3 for statistical programmers

Welcome, SAS 9.3! I've already blogged about some interface and graphical changes that everyone should know about. Now I'll put on my statistical hat and mention a few 9.3 features that excite me, personally, as a data analyst and a statistical programmer: As a statistician, I am keen to try

Read More

Rick WicklinJuly 14, 2011 0

Welcome SAS 9.3! Five interface and graphics features that everyone can use

Here are a few new interface and graphics changes that every SAS programmer should know about SAS 9.3: HTML is now the default output destination when you run the SAS windowing environment. This means that tables and graphs appear in an HTML document instead of the classic LISTING destination. Of

Read More

Rick WicklinJuly 13, 2011 0

Simulate categorical data in SAS

As I was reviewing notes for my course "Data Simulation for Evaluating Statistical Methods in SAS," I realized that I haven't blogged about simulating categorical data in SAS. This article corrects that oversight. An Easy Way and a Harder Way SAS software makes it easy to sample from discrete "named"

Read More

Rick WicklinJuly 12, 2011 0

Statistics and the Casey Anthony Case: False positives and false negatives

Arnold Loewy, professor of criminal law at Texas Tech University, wrote an editorial about the Casey Anthony case that has statistical undertones. Prof. Loewy discusses the fact that there are two kinds of errors that can occur in a court trial: an innocent person can be sent to jail or

Read More

Rick WicklinJuly 11, 2011 0

Cleaning up after yourself: Deleting data sets

"Always clean up after yourself." My mother taught me this, and I apply it to SAS programming as regularly as I apply it at home. For SAS programming, I reinterpret Mom's saying as the following rule: Always delete temporary files and data sets when you are finished using them. How

Read More

Rick WicklinJuly 8, 2011 0

The area under a density estimate curve: Nonparametric estimates

One of the joys of statistics is that you can often use different methods to estimate the same quantity. Last week I described how to compute a parametric density estimate for univariate data, and use the parameters estimates to compute the area under the probability density function (PDF). This article

Read More

Rick WicklinJuly 7, 2011 0

Improving graphs of highly correlated data

If you create a scatter plot of highly correlated data, you will see little more than a thin cloud of points. Small-scale relationships in the data might be masked by the correlation. For example, Luke Miller recently posted a scatter plot that compares the body temperature of snails when they

Read More

Rick WicklinJuly 6, 2011 0

To jitter or not to jitter: That is the question

In a previous article, I discussed random jittering as a technique to reduce overplotting in scatter plots. The example used data that are rounded to the nearest unit, although the idea applies equally well to ordinal data in general. The act of jittering (adding random noise to data) is a

Read More

Rick WicklinJuly 5, 2011 0

Jittering to prevent overplotting in statistical graphics

Jittering. To a statistician, it is more than what happens when you drink too much coffee. Jittering is the act of adding random noise to data in order to prevent overplotting in statistical graphs. Overplotting can occur when a continuous measurement is rounded to some convenient unit. This has the

Read More

Rick WicklinJuly 1, 2011 0

The area under a density estimate curve: Parametric estimates

The area under a density estimate curve gives information about the probability that an event occurs. The simplest density estimate is a histogram, and last week I described a few ways to compute empirical estimates of probabilities from histograms and from the data themselves, including how to construct the empirical

Read More

Rick WicklinJune 29, 2011 0

Add a diagonal line to a scatter plot

In my statistical analysis of coupons article, I presented a scatter plot that includes the identity line, y=x. This post describes how to write a general program that uses the SGPLOT procedure in SAS 9.2. By a "general program," I mean that the program produces the result based on the

Read More

Rick WicklinJune 27, 2011 0

Check that a SAS data set exists (and other data maintenance tips)

In Base SAS you can use the DATASETS procedure to determine the SAS data sets in a library, and you can use the DELETE statement to delete data sets. Did you know that you can do the same operations from within the SAS/IML language? The following DATA step creates four

Read More

Rick WicklinJune 24, 2011 0

The area under a density estimate curve

Readers' comments indicate that my previous blog article about computing the area under an ROC curve was helpful. Great! There is another common application of numerical integration: finding the area under a density estimation curve. This article provides an overview of density estimation and computes an empirical cumulative density function.

Read More

Rick WicklinJune 22, 2011 0

Detecting missing values in the SAS/IML language

This is Part 4 of my response to Charlie Huang's interesting article titled Top 10 most powerful functions for PROC SQL. As I did for eaerlier topics, I will examine one of the "powerful" SQL functions that Charlie mentions and show how to do the same computation in SAS/IML software.

Read More

Rick WicklinJune 21, 2011 0

Overlaying two histograms in SAS

A reader commented to me that he wants to use the HISTOGRAM statement of the SGPLOT procedure to overlay two histograms on a single plot. He could do it, but unfortunately SAS was choosing a large bin width for one of the variables and a small bin width for the

Read More

Rick WicklinJune 20, 2011 0

Pre-allocate arrays to improve efficiency

Recently Charlie Huang showed how to use the SAS/IML language to compute an exponentially weighted moving average of some financial data. In the commentary to his analysis, he said: I found that if a matrix or a vector is declared with specified size before the computation step, the program’s efficiency

Read More

Rick WicklinJune 17, 2011 0

A statistical analysis of coupons

Each Sunday, my local paper has a starburst image on the front page that proclaims "Up to $169 in Coupons!" (The value changes from week to week.) One day I looked at the image and thought, "Does the paper hire someone to count the coupons? Is this claim a good

Read More

Rick WicklinJune 15, 2011 0

Enumerating levels of a classification variable

A colleague asked, "How can I enumerate the levels of a categorical classification variable in SAS/IML software?" The variable was a character variable with n observations, but he wanted the following: A "look-up table" that contains the k (unique) levels of the variable. A vector with n elements that contains

Read More

Rick WicklinJune 13, 2011 0

Blogging, programming, and Johari windows

My primary purpose in writing The DO Loop blog is to share what I know about statistical programming in general and about SAS programming in particular. But I also write the blog for various personal reasons, including the enjoyment of writing. The other day I encountered a concept on Ajay

Read More

Rick WicklinJune 8, 2011 0

Calling Base SAS functions from SAS/IML: How to handle argument lists?

Over at the SAS/IML Discussion Forum, there have been several posts about how to call a Base SAS functions from SAS/IML when the Base SAS function supports a variable number of arguments. It is easy to call a Base SAS function from SAS/IML software when the syntax for the function

Read More

Rick WicklinJune 6, 2011 0

Use subscript reduction operators!

Writing efficient SAS/IML programs is very important. One aspect to efficient SAS/IML programming is to avoid unnecessary DO loops. In my book, Statistical Programming with SAS/IML Software, I wrote (p. 80): One way to avoid writing unnecessary loops is to take full advantage of the subscript reduction operators for matrices.

Read More

Rick WicklinJune 3, 2011 0

A statistical application of numerical integration: The area under an ROC curve

In a previous blog post, I presented a short SAS/IML function module that implements the trapezoidal rule. The trapezoidal rule is a numerical integration scheme that gives the integral of a piecewise linear function that passes through a given set of points. This article demonstrates an application of using the

Read More

Rick WicklinJune 1, 2011 0

The trapezoidal rule of integration

In a previous article I discussed the situation where you have a sequence of (x,y) points and you want to find the area under the curve that is defined by those points. I pointed out that usually you need to use statistical modeling before it makes sense to compute the

Read More

Rick WicklinMay 27, 2011 0

Obtaining area from a set of points on a curve

The other day I was asked, "Given a set of points, what is the area under the curve defined by those points?" As stated, the problem is not well defined. The problem is that "the curve defined by those points" doesn't have a precise meaning. However, after gathering more information,

Read More

Rick WicklinMay 25, 2011 0

Computing the trace of a product of matrices

Recently I had to compute the trace of a product of square matrices. That is, I had two large nxn matrices, A and B, and I needed to compute the quantity trace(A*B). Furthermore, I was going to compute this quantity thousands of times for various A and B as part

Read More

Rick WicklinMay 23, 2011 0

Listing SAS/IML variables

Did you know that you can display a list of all the SAS/IML variables (matrices) that are defined in the current session? The SHOW statement performs this useful task. For example, the following statements define three matrices: proc iml; fruit = {"apple", "banana", "pear"}; k = 1:3; x = j(1E5,

Read More

Rick WicklinMay 20, 2011 0

How to create a scatter plot with marginal histograms in SAS

Many people know that the SGPLOT procedure in SAS 9.2 can create a large number of interesting graphs. Some people also know how to create a panel of graphs (all of the same type) by using the SGPANEL procedure. But did you know that you can also create a panel

Read More

Rick WicklinMay 18, 2011 0

Random access: How to read specific observations in SAS/IML software

This article shows how to randomly access data in a SAS data set by using the READ POINT statement in SAS/IML software. I have previously discussed how to use the READ NEXT and READ CURRENT statements to sequentially access each observation in a SAS data set from PROC IML. Reading

Read More

Rick WicklinMay 17, 2011 0

Inadequate finishes

Andrew Ratcliffe posted a fine article titled "Inadequate Mends" in which he extols the benefits of including the name of a macro on the %MEND statement. That is, if you create a macro function named foo, he recommends that you include the name in two places: %macro foo(x); /** define

Read More

Previous 1 … 46 47 48 49 50 … 53 Next