A matrix is an array of numbers or character strings. When I print a matrix, I usually want to see only the data. However, sometimes it is helpful to add row or column headings that indicate the names of variables or labels for rows. A simple example is count data
English
In a previous blog post, I showed how to use the LOGISTIC procedure to construct a receiver operator characteristic (ROC) curve in SAS. That same day, Charlie H. blogged about how to use the DATA step to construct an ROC curve from basic principles. It has been a long time
I've written about how to add a diagonal line to a scatter plot by using the SGPLOT procedure in SAS 9.2. The main idea (use the VECTOR statement) is easy enough, but writing a program that handles a line with any slope requires some additional effort. But now SAS 9.3
The other day I needed to compute the signum function for each element of a matrix. If x is a real number, then the sgn(x) is -1 when x<0, 1 when x>0, and 0 when x=0. I wrote a SAS/IML module that contains a compact little expression: proc iml; start
I recently blogged about how many times, on average, you must roll a die until you see all six faces. This question is a special case of the coupon collector's problem. My son noted that the expected value (the mean number of rolls) is not necessarily the best statistic to
"Dad? How many times do I have to roll a die until all six sides appear?" I stopped what I was doing to consider my son's question. Although I could figure out the answer mathematically, sometimes experiments are more powerful than math equations for showing how probability works. "Why don't
Yesterday, Jiangtang Hu did a frequency analysis of my blog posts and noticed that there are some holidays on which I post to my blog and others on which I do not. The explanation is simple: I post on Mondays, Wednesdays, and Fridays, provided that SAS Institute (World Headquarters) is
Dates and times. As Wayne Finley states in his SUGI25 paper on SAS date and time handling, "The SAS system provides a plethora of methods to handle date and time values." Along with the plethora of methods is a plethora of papers on the topic. If you want to trick
I feel privileged to have been invited back to meet with SAS customers throughout New Zealand and Australia. I also feel lucky to escape the North Carolina summer (with temperatures trending in the 90s Fahrenheit) in exchange for the "winter" weather Down Under. For a good chunk of August, I'll
Tuesday's release of SAS 9.3 included the new SAS Forecast Server 4.1, which has several valuable enhancements: Combination (Ensemble) Models: A combination of forecasts using different forecasting techniques can outperform forecasts produced by using any single technique. Users can combine forecasts produced by many different models using several different combination
Welcome, SAS 9.3! I've already blogged about some interface and graphical changes that everyone should know about. Now I'll put on my statistical hat and mention a few 9.3 features that excite me, personally, as a data analyst and a statistical programmer: As a statistician, I am keen to try
Rick Wicklin created his own list of Five Interface and Graphics Features that Everyone Can Use. It's a very good summary of what you'll immediately notice when you use analytics procedures in SAS display manager: cool graphs turned on by default. For SAS Enterprise Guide users, you won't see such
Most of us grew up playing some type of sport and dreaming of becoming a collegiate or professional athlete. For me, it was a focus on dance and striving to be a professional ballerina. At some point we realized that in order to make this dream a reality, we’d have
Here are a few new interface and graphics changes that every SAS programmer should know about SAS 9.3: HTML is now the default output destination when you run the SAS windowing environment. This means that tables and graphs appear in an HTML document instead of the classic LISTING destination. Of
As I was reviewing notes for my course "Data Simulation for Evaluating Statistical Methods in SAS," I realized that I haven't blogged about simulating categorical data in SAS. This article corrects that oversight. An Easy Way and a Harder Way SAS software makes it easy to sample from discrete "named"
Alison posted the Top 10 Reasons you should care about SAS 9.3. It's a bit tongue-in-cheek, but it reflects just a sample of the thousands of features and tweaks that you'll see in this new release. Even with SAS 9.2, I was nowhere near exhausting my backlog of blog topics...but
It seems like such a simple problem: how can you reliably compute the age of someone or something? Susan lamented the subtle issues using the YRDIF function exactly 1.0356164384 years ago. Sure, you could write your own function for calculating such things, as I suggested 0.1753424658 years ago. Or you
Arnold Loewy, professor of criminal law at Texas Tech University, wrote an editorial about the Casey Anthony case that has statistical undertones. Prof. Loewy discusses the fact that there are two kinds of errors that can occur in a court trial: an innocent person can be sent to jail or
SAS Enterprise Guide sets values for several useful SAS macro variables when it connects to a SAS session, including one macro variable, &_CLIENTPROJECTPATH, that contains the name and path of the current SAS Enterprise Guide project file. (To learn about this and other macro variables that SAS Enterprise Guide assigns,
"Always clean up after yourself." My mother taught me this, and I apply it to SAS programming as regularly as I apply it at home. For SAS programming, I reinterpret Mom's saying as the following rule: Always delete temporary files and data sets when you are finished using them. How
As I write this, SAS 9.3 has not yet been "shipped", but its release is imminent. I've already heard many questions about how SAS Enterprise Guide works with the new version, so I decided to write this "Frequently-soon-to-be-asked questions" document to help sort it out. What version of SAS Enterprise
One of the joys of statistics is that you can often use different methods to estimate the same quantity. Last week I described how to compute a parametric density estimate for univariate data, and use the parameters estimates to compute the area under the probability density function (PDF). This article
The recent issue of InformationWeek features a Q&A session with Ken Thompson, one of the creators of the Unix operating system. (He collaborated with Dennis Ritchie, of C language fame. Since much of SAS is written in C, I daresay there are a few copies of K&R around here.) One
If you create a scatter plot of highly correlated data, you will see little more than a thin cloud of points. Small-scale relationships in the data might be masked by the correlation. For example, Luke Miller recently posted a scatter plot that compares the body temperature of snails when they
Here at SAS Press, we offer a strong, stable publishing team with over 55 years of combined experience. But as a potential author (or even current one) or fan of our press, you might want to get a better feel for the people behind the book. Thus a new feature,
In a previous article, I discussed random jittering as a technique to reduce overplotting in scatter plots. The example used data that are rounded to the nearest unit, although the idea applies equally well to ordinal data in general. The act of jittering (adding random noise to data) is a
Last week, I attended the International Center for Leadership in Education’s Model Schools Conference in Nashville, TN, where I learned about many forward-thinking education initiatives taking place across the country. My colleagues and I also had the privilege of facilitating a SAS(r) EVAAS for K-12 presentation from two principals at
Jittering. To a statistician, it is more than what happens when you drink too much coffee. Jittering is the act of adding random noise to data in order to prevent overplotting in statistical graphs. Overplotting can occur when a continuous measurement is rounded to some convenient unit. This has the
The area under a density estimate curve gives information about the probability that an event occurs. The simplest density estimate is a histogram, and last week I described a few ways to compute empirical estimates of probabilities from histograms and from the data themselves, including how to construct the empirical
In my statistical analysis of coupons article, I presented a scatter plot that includes the identity line, y=x. This post describes how to write a general program that uses the SGPLOT procedure in SAS 9.2. By a "general program," I mean that the program produces the result based on the