Readers' comments indicate that my previous blog article about computing the area under an ROC curve was helpful. Great! There is another common application of numerical integration: finding the area under a density estimation curve. This article provides an overview of density estimation and computes an empirical cumulative density function.

# Author

This is Part 4 of my response to Charlie Huang's interesting article titled Top 10 most powerful functions for PROC SQL. As I did for eaerlier topics, I will examine one of the "powerful" SQL functions that Charlie mentions and show how to do the same computation in SAS/IML software.

A reader commented to me that he wants to use the HISTOGRAM statement of the SGPLOT procedure to overlay two histograms on a single plot. He could do it, but unfortunately SAS was choosing a large bin width for one of the variables and a small bin width for the

Recently Charlie Huang showed how to use the SAS/IML language to compute an exponentially weighted moving average of some financial data. In the commentary to his analysis, he said: I found that if a matrix or a vector is declared with specified size before the computation step, the programâ€™s efficiency

Each Sunday, my local paper has a starburst image on the front page that proclaims "Up to $169 in Coupons!" (The value changes from week to week.) One day I looked at the image and thought, "Does the paper hire someone to count the coupons? Is this claim a good

A colleague asked, "How can I enumerate the levels of a categorical classification variable in SAS/IML software?" The variable was a character variable with n observations, but he wanted the following: A "look-up table" that contains the k (unique) levels of the variable. A vector with n elements that contains

My primary purpose in writing The DO Loop blog is to share what I know about statistical programming in general and about SAS programming in particular. But I also write the blog for various personal reasons, including the enjoyment of writing. The other day I encountered a concept on Ajay

Over at the SAS/IML Discussion Forum, there have been several posts about how to call a Base SAS functions from SAS/IML when the Base SAS function supports a variable number of arguments. It is easy to call a Base SAS function from SAS/IML software when the syntax for the function

Writing efficient SAS/IML programs is very important. One aspect to efficient SAS/IML programming is to avoid unnecessary DO loops. In my book, Statistical Programming with SAS/IML Software, I wrote (p. 80): One way to avoid writing unnecessary loops is to take full advantage of the subscript reduction operators for matrices.

In a previous blog post, I presented a short SAS/IML function module that implements the trapezoidal rule. The trapezoidal rule is a numerical integration scheme that gives the integral of a piecewise linear function that passes through a given set of points. This article demonstrates an application of using the

In a previous article I discussed the situation where you have a sequence of (x,y) points and you want to find the area under the curve that is defined by those points. I pointed out that usually you need to use statistical modeling before it makes sense to compute the

The other day I was asked, "Given a set of points, what is the area under the curve defined by those points?" As stated, the problem is not well defined. The problem is that "the curve defined by those points" doesn't have a precise meaning. However, after gathering more information,

Recently I had to compute the trace of a product of square matrices. That is, I had two large nxn matrices, A and B, and I needed to compute the quantity trace(A*B). Furthermore, I was going to compute this quantity thousands of times for various A and B as part

Did you know that you can display a list of all the SAS/IML variables (matrices) that are defined in the current session? The SHOW statement performs this useful task. For example, the following statements define three matrices: proc iml; fruit = {"apple", "banana", "pear"}; k = 1:3; x = j(1E5,

Many people know that the SGPLOT procedure in SAS 9.2 can create a large number of interesting graphs. Some people also know how to create a panel of graphs (all of the same type) by using the SGPANEL procedure. But did you know that you can also create a panel

This article shows how to randomly access data in a SAS data set by using the READ POINT statement in SAS/IML software. I have previously discussed how to use the READ NEXT and READ CURRENT statements to sequentially access each observation in a SAS data set from PROC IML. Reading

Andrew Ratcliffe posted a fine article titled "Inadequate Mends" in which he extols the benefits of including the name of a macro on the %MEND statement. That is, if you create a macro function named foo, he recommends that you include the name in two places: %macro foo(x); /** define

A fundamental operation in data analysis is finding data that satisfy some criterion. How many people are older than 85? What are the phone numbers of the voters who are registered Democrats? These questions are examples of locating data with certain properties or characteristics. The SAS DATA step has a

For years I've been making presentations about SAS/IML software at conferences. Since 2008, I've always mentioned to SAS customers that they can call R from within SAS/IML software. (This feature was introduced in SAS/IML Studio 3.2 and was added to the IML procedure in SAS/IML 9.22.) I also included a

When Charlie H. posted an interesting article titled "Top 10 most powerful functions for PROC SQL," there was one item on his list that was unfamiliar: the COALESCE function. (Edit: Charlie's blog no longer exists. The article used to be available at http://www.sasanalysis.com/2011/01/top-10-most-powerful-functions-for-proc.html) Ever since I posted my first response,

Last week the Flowing Data blog published an excellent visualization of the flight patterns of major US airlines. On Friday, I sent the link to Robert Allison, my partner in the 2009 ASA Data Expo, which explored airline data. Robert had written a SAS program for the Expo that plots

When I was at the annual SAS Global Forum conference, I had the pleasure of discussing statistical programming and SAS/IML software with dozens of SAS customers. I was asked at least ten times, "How do I get started with SAS/IML software?" or "How can I learn PROC IML?" Here is

This blog post shows how to numerically integrate a one-dimensional function by using the QUAD subroutine in SAS/IML software. The name "quad" is short for quadrature, which means numerical integration. You can use the QUAD subroutine to numerically find the definite integral of a function on a finite, semi-infinite, or

More than a month ago I wrote a first article in response to an interesting article by Charlie H. titled Top 10 most powerful functions for PROC SQL. In that article I described SAS/IML equivalents to the MONOTONIC, COUNT, N, FREQ, and NMISS Functions in PROC SQL. In this article,

The most common way to read observations from a SAS data set into SAS/IML matrices is to read all of the data at once by using the ALL clause in the READ statement. However, the READ statement also has options that do not require holding all of the observations in

In last week's article on how to create a funnel plot in SAS, I wrote the following comment: I have not adjusted the control limits for multiple comparisons. I am doing nine comparisons of individual means to the overall mean, but the limits are based on the assumption that I'm

The log transformation is one of the most useful transformations in data analysis. It is used as a transformation to normality and as a variance stabilizing transformation. A log transformation is often used as part of exploratory data analysis in order to visualize (and later model) data that ranges over

One of the advantages of programming in the SAS/IML language is its ability to transform data vectors with a single statement. For example, in data analysis, the log and square-root functions are often used to transform data so that the transformed data have approximate normality. The following SAS/IML statements create

Last week I showed how to create a funnel plot in SAS. A funnel plot enables you to compare the mean values (or rates, or proportions) of many groups to some other value. The group means are often compared to the overall mean, but they could also be compared to

Last week I presented the GSR algorithm, a statistical model of a riffle shuffle. In the model, a deck of n cards is split into two parts according to the binomial distribution. Each piece has roughly n/2 cards. Then cards are dropped from the two stacks according to the number