In this International Year of Statistics, I'd like to describe the major role of statistics in public health advances. In our modern society, it is sometimes difficult to recall the huge advances in health and medicine in the 20th century. To name a few: penicillin was discovered in 1928, risk
Author
A SAS customer asks: How do I use SAS to generate multiple samples of size N from a multivariate normal distribution? Suppose that you want to simulate k samples (each with N observations) from a multivariate normal distribution with a given mean vector and covariance matrix. Because all of the
ODS statements are global SAS statements. As such, you can put them anywhere in your SAS program. For maximum readability, many SAS programmers agree that most ODS statements should appear outside procedures in "open" SAS code. For example, most programmers agree that the following statements should appear outside of procedures:
I was recently asked how to compute the difference between two density estimates in SAS. The person who asked the question sent me a link to a paper from The Review of Economics and Statistics that contains several examples of this technique (for example, see Figure 3 on p. 16
Did you know that your ODS style might result in changing the color ramp for contour plots and heat maps? For example, the default style in SAS 9.3 is HTMLBlue. Let's create a contour plot in the HTML destination by running an example adapted from the documentation for the RSREG
In statistics, distances between observations are used to form clusters, to identify outliers, and to estimate distributions. Distances are used in spatial statistics and in other application areas. There are many ways to define the distance between observations. I have previously written an article that explains Mahalanobis distance, which is
It is easy to use the SGPLOT procedure in SAS to plot the graph of a well-behaved continuous function: just create a data set of the (x,y) values on some domain and use the SERIES statement to connect the points. However, to plot the graph of a discontinuous function correctly
Someone recently asked a question on the SAS Support Communities about estimating parameters in ridge regression. I answered the question by pointing to a matrix formula in the SAS documentation. One of the advantages of the SAS/IML language is that you can implement matrix formulas in a natural way. The
One purpose of this International Year of Statistics is to spread the word that the field of statistics benefits society. As part of the International Year, many organizations, including SAS and the American Statistical Association (ASA), are turning to history to illustrate how statistics is vital to the health and
Argh! I've just spilled coffee on output that shows the least squares coefficients for a regression model that I was investigating. Now the parameter estimate for the intercept is completely obscured, although I can still see the parameter estimates for the coefficients of the continuous explanatory variable. What can I
Last week there was an interesting question posted to the "Stat-Math Statistics" group on LinkedIn. The original question was a little confusing, so I'll state it in a more general form: A population is normally distributed with a known mean and standard deviation. A sample of size N is drawn
There is something for everyone at SAS Global Forum 2013. I like to attend presentations in the Statistics and Data Analysis track and talk with SAS customers in the SAS Support and Demo Area. But one activity that I enjoy the most is to stroll through the poster area and
I am pleased to announce that the documentation for the IMLPlus language is now available online. Previously, this resource was available only from within the SAS/IML Studio application. This documentation can now be accessed by anyone, regardless of whether they have installed SAS/IML Studio. As I have described previously, IMLPlus
A SAS user asked an interesting question on the SAS/GRAPH and ODS Graphics Support Forum. The question is: Does PROC SGPLOT support a way to display the slope of the regression line that is computed by the REG statement? Recall that the REG statement in PROC SGPLOT fits and displays
The SAS/IML language has a curious syntax that enables you to specify a "repetition factor" when you initialize a vector of literal values. Essentially, the language enables you to specify the frequency of an element. For example, suppose you want to define the following vector: proc iml; x = {1
I have previously written about how to use the "table" distribution to generate random values from a discrete probability distribution. For example, if there are 50 black marbles, 20 red marbles, and 30 white marbles in a box, the following SAS/IML program simulates random draws (with replacement) of 1,000 marbles:
Suppose that you have a SAS/IML matrix and you want to set each element of a submatrix to zero (or any other value). There is a simple syntax that accomplishes this task. If you subscript a matrix and do not specify a row, it means "use all rows." So, for
If you are like me, you've experienced the following frustration. You are reading the SAS/STAT documentation, trying to understand some procedure or option, when you find an example that is very similar to what you need. "Great," you think, "this example will help me understand how the SAS procedure works!"
In linear algebra, the I symbol is used to denote an n x n identity matrix. The symbol J (or sometimes 1) is used to denote an n x p matrix of ones. When the SAS/IML language was implemented, the I function was defined to generate the identity matrix. The J function was defined
Last week the SAS Training Post blog posted a short article on an easy way to find variables in common to two data sets. The article used PROC CONTENTS (with the SHORT option) to print out the names of variables in SAS data sets so that you can visually determine
I wanted to write a blog post about the "Table distribution" in SAS. The Table distribution, which is supported by the RAND and the RANDGEN function, enables you to specify the probability of selecting each of k items. Therefore you can use the Table distribution to sample, with replacement, from
The SAS/IML language secretly creates temporary variables. Most of the time programmers aren't even aware that the language does this. However, there is one situation where if you don't think carefully about temporary variables, your program will silently produce an error. And as every programmer knows, silent wrong numbers are
SAS has several kinds of special data sets whose contents are organized according to certain conventions. These special data sets are marked with the TYPE= data set attribute. For example, the CORR procedure can create a data set with the TYPE=CORR attribute. You can decipher the structure of the data
I like to be efficient in my SAS/IML programs, but sometimes I get into bad habits. Recently I realized that I was reshaping a bunch of SAS/IML row vectors because I wanted to write them to a SAS data set. This is completely unnecessary! The SAS/IML language will create a
A SAS/IML user on a discussion forum was trying to read data into a SAS/IML matrix, but the data was so large that it would not fit into memory. (Recall that SAS/IML matrices are kept in RAM.) After a few questions, it turned out that the user was trying to
A while ago I saw a blog post on how to simulate Bernoulli outcomes when the probability of generating a 1 (success) varies from observation to observation. I've done this often in SAS, both in the DATA step and in the SAS/IML language. For example, when simulating data that satisfied
When a categorical variable has dozens or hundreds of categories, it is often impractical and undesirable to create a bar chart that shows the counts for all categories. Two alternatives are popular: Display only the Top 10 or Top 20 categories. As I showed last week, to do this in
Sometimes a categorical variable has many levels, but you are only interested in displaying the levels that occur most frequently. For example, if you are interested in the number of times that a song was purchased on iTunes during the past week, you probably don't want a bar chart with
I am pleased to announce that this year at SAS Global Forum 2013 (San Francisco, April 27 to May 1, 2013) I am giving a free hands-on workshop (HOW) entitled "Getting Started with the SAS/IML Language." If you are not familiar with the very popular Hands-On Workshop series at SAS
It's the start of a new year. Have you made a resolution to be a better data analyst? A better SAS statistical programmer? To learn more about multivariate statistics? What better way to start the New Year than to read (or re-read!) the top 12 articles for statistical programmers from