In a previous article, I discussed random jittering as a technique to reduce overplotting in scatter plots. The example used data that are rounded to the nearest unit, although the idea applies equally well to ordinal data in general. The act of jittering (adding random noise to data) is a
Uncategorized
Last week, I attended the International Center for Leadership in Education’s Model Schools Conference in Nashville, TN, where I learned about many forward-thinking education initiatives taking place across the country. My colleagues and I also had the privilege of facilitating a SAS(r) EVAAS for K-12 presentation from two principals at
Jittering. To a statistician, it is more than what happens when you drink too much coffee. Jittering is the act of adding random noise to data in order to prevent overplotting in statistical graphs. Overplotting can occur when a continuous measurement is rounded to some convenient unit. This has the
The area under a density estimate curve gives information about the probability that an event occurs. The simplest density estimate is a histogram, and last week I described a few ways to compute empirical estimates of probabilities from histograms and from the data themselves, including how to construct the empirical
In my statistical analysis of coupons article, I presented a scatter plot that includes the identity line, y=x. This post describes how to write a general program that uses the SGPLOT procedure in SAS 9.2. By a "general program," I mean that the program produces the result based on the
In Base SAS you can use the DATASETS procedure to determine the SAS data sets in a library, and you can use the DELETE statement to delete data sets. Did you know that you can do the same operations from within the SAS/IML language? The following DATA step creates four
About a year ago (wow, has it been that long?), I posted an example program that lets you report on the contents of a SAS information map. Using my example, you can see the data items, filters, and folder structure within a given information map. Last week a reader posted
Readers' comments indicate that my previous blog article about computing the area under an ROC curve was helpful. Great! There is another common application of numerical integration: finding the area under a density estimation curve. This article provides an overview of density estimation and computes an empirical cumulative density function.
This is Part 4 of my response to Charlie Huang's interesting article titled Top 10 most powerful functions for PROC SQL. As I did for eaerlier topics, I will examine one of the "powerful" SQL functions that Charlie mentions and show how to do the same computation in SAS/IML software.
As far as numbers go, the number zero is rather mysterious for data. Is it something or is it nothing? What happens when you have missing data but enter 0? This topic triggered an intriguing discussion in my recent Programming 2: Data Manipulation Techniques class. In this post I’d like
We sometimes take it for granted, but the concept of the "SAS library" is just about one of the most awesome aspects of The SAS System. You can give your library a name (a library reference, or libref), tell the system how to get to your data (options on a
A reader commented to me that he wants to use the HISTOGRAM statement of the SGPLOT procedure to overlay two histograms on a single plot. He could do it, but unfortunately SAS was choosing a large bin width for one of the variables and a small bin width for the
Recently Charlie Huang showed how to use the SAS/IML language to compute an exponentially weighted moving average of some financial data. In the commentary to his analysis, he said: I found that if a matrix or a vector is declared with specified size before the computation step, the program’s efficiency
SAS-based processes are critical to many organizations, but sometimes the trickiest part of your job falls into one or both of these activities: Getting stuff from the outside world "into" SAS. (Once it's in SAS, as many of you know, the world is your oyster.) Getting the output of your
Each Sunday, my local paper has a starburst image on the front page that proclaims "Up to $169 in Coupons!" (The value changes from week to week.) One day I looked at the image and thought, "Does the paper hire someone to count the coupons? Is this claim a good
The 2011 Joint Statistical Meetings (JSM) will be from July 31-August 3 in Miami Beach. Look for acquisitions editor and SAS Press conference mega-maven Shelley Sessoms at the SAS Publishing booth (#504). Shelley can talk with you about: New and forthcoming statistics-related books from SAS Press, including Statistical Analysis for
A colleague asked, "How can I enumerate the levels of a categorical classification variable in SAS/IML software?" The variable was a character variable with n observations, but he wanted the following: A "look-up table" that contains the k (unique) levels of the variable. A vector with n elements that contains
Sometime very recently, probably while you weren't looking, I changed jobs at SAS (yes, again). This time it's a bigger change for me, because I'm no longer part of the SAS R&D organization, where I've worked for nearly 14 years. Instead, I'm part of the team known internally as Professional
My primary purpose in writing The DO Loop blog is to share what I know about statistical programming in general and about SAS programming in particular. But I also write the blog for various personal reasons, including the enjoyment of writing. The other day I encountered a concept on Ajay
No, BONEZONE is not the website of wayward legislators. It is, however, a trade journal of the orthopaedic devices industry, and the Summer 2011 issue features a nice mention of Forecast Value Added (FVA) analysis in an article by Tom Wallace. In "Forecasting: It's Getting Better," Tom refers to FVA
This week's featured tip from SAS Press author Jack Shostak should spark the interest of those of you working in the pharmaceutical or clinical trials industries. While I'm not a programmer, myself, Jack's book has been consistently described as being accessible, practical, and helpful for both beginners and advanced SAS
Policing has profoundly changed over the last several decades and its evolution will continue as long as there are crimes to commit and communities to serve. The very nature of policing is dynamic – it always has been and always will be. Those dynamics are driven by many things –
Over at the SAS/IML Discussion Forum, there have been several posts about how to call a Base SAS functions from SAS/IML when the Base SAS function supports a variable number of arguments. It is easy to call a Base SAS function from SAS/IML software when the syntax for the function
Writing efficient SAS/IML programs is very important. One aspect to efficient SAS/IML programming is to avoid unnecessary DO loops. In my book, Statistical Programming with SAS/IML Software, I wrote (p. 80): One way to avoid writing unnecessary loops is to take full advantage of the subscript reduction operators for matrices.
There is nothing the gambler, investor, forecaster, or Match.com dater likes as much as the sure thing. Don't we all? Back in April I stated what I claimed to be a sure thing forecast: In any group of 2 or more people, there is at least one pair of people
In a previous blog post, I presented a short SAS/IML function module that implements the trapezoidal rule. The trapezoidal rule is a numerical integration scheme that gives the integral of a piecewise linear function that passes through a given set of points. This article demonstrates an application of using the
In a previous article I discussed the situation where you have a sequence of (x,y) points and you want to find the area under the curve that is defined by those points. I pointed out that usually you need to use statistical modeling before it makes sense to compute the
As organizations confront the limits of forecasting, they finally realize the folly in a blind pursuit of unachievable levels of forecast accuracy. The best accuracy we can ever hope to achieve is limited by the nature -- the forecastability -- of what we are trying to forecast. Anything better than
The other day I was asked, "Given a set of points, what is the area under the curve defined by those points?" As stated, the problem is not well defined. The problem is that "the curve defined by those points" doesn't have a precise meaning. However, after gathering more information,
SAS Enterprise Guide has about 150 options that you can customize in the Tools->Options window. With each release, the development team adds a few more options that have been asked for by customers, and they rarely decommission any existing options. It's getting quite crowded on some of those options windows!