On Kaiser Fung's Junk Charts blog, he showed a bar chart that was "published by Teach for America, touting its diversity." Kaiser objected to the chart because the bar lengths did not accurately depict the proportions of the Teach for America corps members. The chart bothers me for another reason:
Author
Should you ever guess on the SAT® or PSAT standardized tests? My son is getting ready to take the preliminary SAT (PSAT), which is a practice test for the SAT. A teacher gave his class this advice regarding guessing: For a multiple-choice questions, if you can eliminate one or two
In my last blog post I described how to implement a "runs test" in the SAS/IML language. The runs test determines whether a sequence of two values (for example, heads and tails) is likely to have been generated by random chance. This article describes two applications of the runs test.
While walking in the woods, a statistician named Goldilocks wanders into a cottage and discovers three bears. The bears, being hungry, threaten to eat the young lady, but Goldilocks begs them to give her a chance to win her freedom. The bears agree. While Mama Bear and Papa Bear block
What is the best way to share SAS/IML functions with your colleagues? Give them the source code? Create a function library that they can use? This article describes three techniques that make your SAS/IML functions accessible to others. As background, remember that you can define new functions and subroutines in
Massive open online courses (MOOCs) are all the rage today. Some people see free online courses as a convenient way to introduce statistical concepts to tens of thousands of students who would not otherwise have an opportunity to learn about data analysis. Whereas 2013 is the International Year of Statistics,
Last week I described how to generate permutations in SAS. A related concept is the "combination." In probability and statistics, a combination is a subset of k items chosen from a set that contains N items. Order does not matter, so although the ordered triplets (B, A, C) and (C,
This is the last post in my recent series of articles on computing contours in SAS. Last month a SAS customer asked how to compute the contours of the bivariate normal cumulative distribution function (CDF). Answering that question in a single blog post would have resulted in a long article,
I've written several articles that show how to generate permutations in SAS. In the SAS DATA step, you can use the ALLPEM subroutine to generate all permutations of a DATA step array that contain a small number (18 or fewer) elements. In addition, the PLAN procedure enables you to generate
I'm spoiled by the internet. I've grown so accustomed to being able to instantly find an answer to any query—no matter how obscure—that I am surprised when I don't find what I am looking for. The other day I was trying to find a mathematical result: a formula for the
Like many other computer packages, SAS can produce a contour plot that shows the level sets of a function of two variables. For example, I've previously written blogs that use contour plots to visualize the bivariate normal density function and to visualize the cumulative normal distribution function. However, sometimes you
SAS 9 has supported calling R from the SAS/IML language since 2009. The interface to R is part of the SAS/IML language. However, there have been so many versions of SAS and R since 2009, that it is hard to remember which SAS release supports which versions of R. The
This week I read an interesting blog post that led to a discussion about specifying the frequencies of observations in a regression model. In SAS software, many of the analysis procedures contain a FREQ statement for specifying frequencies and a WEIGHT statement for specifying weights in a weighted regression. Theis
In a previous post, I showed how to solve differential equations in SAS by using the ODE subroutine in the SAS/IML language, which solves initial value problems. This article describes how to draw phase portraits for two classic differential equations: the equations of motion for the simple harmonic oscillator and
Differential equations arise in the modeling of many physical processes, including mechanical and chemical systems. You can solve systems of first-order ordinary differential equations (ODEs) by using the ODE subroutine in the SAS/IML language, which solves initial value problems. This article uses the equations of motion for the classic simple
Last week I presented two talks at the University of Wisconsin at Milwaukee, which has established a new Graduate Certificate in Applied Data Analysis Using SAS. While in Milwaukee, I ran into an old friend: the ODS LISTING destination. One of my presentations was a hands-on workshop titled Getting Started
Sometimes it is useful in the SAS/IML language to convert a character string into a vector of one-character values. For example, you might want to count the frequency distribution of characters, which is easy when each character is an element of a vector. The question of how to convert a
Finding the maximum value of a function is an important task in statistics. There are three approaches to finding a maxima: When the function is available as an analytic expression, you can use an optimization algorithm to find the maxima. For example, in the SAS/IML language, you can use any
Recently I wrote about how to determine the age of your SAS release. Experienced SAS programmers know that you can programatically determine information about your SAS release by using certain automatic macro variables that SAS provides: SYSVER: contains the major and minor version of the SAS release SYSVLONG: contains the
A common visualization is to compare characteristics of two groups. This article emphasizes two tips that will help make the comparison clear. First, consider graphing the differences between the groups. Second, in any plot that has a categorical axis, sort the categories by a meaningful quantity. This article is motivated
Even the best programmers make mistakes. For most errors, SAS software displays the nature and location of the error, returns control to the programmer, and awaits further instructions. However, there are a handful of insidious errors that cause SAS to think that a statement or program is not finished. For
Earlier this week I posted a "guest blog" in which my 8th grade son described a visualization of data for the 2013 ASA Poster Competition. The purpose of today's blog post is to present a higher-level statistical analysis of the same data. I will use a t test and a
Editor's Note: My 8th grade son, David, created a poster that he submitted to the 2013 ASA Poster Competition. The competition encourages students to display "two or more related graphics that summarize a set of data, look at the data from different points of view, and answer specific questions about
My previous post described the multinomial distribution and showed how to generate random data from the multinomial distribution in SAS by using the RANDMULTINOMIAL function in SAS/IML software. The RANDMULTINOMIAL function is simple to use and implements an efficient algorithm called the sequential conditional marginal method (see Gentle (2003), p.
This article describes how to generate random samples from the multinomial distribution in SAS. The content is taken from Chapter 8 of my book Simulating Data with SAS. The multinomial distribution is a discrete multivariate distribution. Suppose there are k different types of items in a box, such as a
How old is your version of SAS software? The graph on the left shows the release dates for various releases of SAS software, beginning with SAS 8.0. The graph is based on a graph on Jiangtang Hu's blog that shows the major SAS releases. As this graph demonstrates, SAS software
Do you have dozens (or even hundreds) of SAS data sets that you want to read into SAS/IML matrices? In a previous blog post, I showed how to iterate over a series of data sets and analyze each one. Inside the loop, I read each data set into a matrix
One of my favorite features of SAS/IML 12.1 (released with 9.3m2) is that the USE and CLOSE statements support reading data set names that are specified in a SAS/IML matrix. The IMLPlus language in SAS/IML Studio has supported this syntax since the early 2000s, so I am pleased that this
The truncated normal distribution TN(μ, σ, a, b) is the distribution of a normal random variable with mean μ and standard deviation σ that is truncated on the interval [a, b]. I previously blogged about how to implement the truncated normal distribution in SAS. A friend wanted to simulate data
This article describes how to implement the truncated normal distribution in SAS. Although the implementation in this article uses the SAS/IML language, you can also implement the ideas and formulas by using the DATA step and PROC FCMP. For reference, I recommend the Wikipedia article on the truncated normal distribution.