Sometimes a population of individuals is modeled as a combination of subpopulations. For example, if you want to model the heights of individuals, you might first model the heights of males and females separately. The height of the population can then be modeled as a combination of the male and

# Author

The other day I encountered a SAS Knowledge Base article that shows how to count the number of missing and nonmissing values for each variable in a data set. However, the code is a complicated macro that is difficult for a beginning SAS programmer to understand. (Well, it was hard

Last week I showed a graph of the number of US births for each day in 2002, which shows a strong day-of-the-week effect. The graph also shows that the number of births on a given day is affected by US holidays. This blog post looks closer at the holiday effect.

Polynomials are used often in data analysis. Low-order polynomials are used in regression to model the relationship between variables. Polynomials are used in numerical analysis for numerical integration and Taylor series approximations. It is therefore important to be able to evaluate polynomials in an efficient manner. My favorite evaluation technique

My friend Chris posted an analysis of the distribution of birthdays for 236 of his Facebook friends. He noted that more of his friends have birthdays in April than in September. The numbers were 28 for April, but only 25 for September. As I reported in my post on "the

You can extend the capability of the SAS/IML language by writing modules. A module is a user-defined function. You can define a module by using the START and FINISH statements. Many people, including myself, define modules at the top of the SAS/IML program in which they are used. You can

Do you know someone who has a birthday in mid-September? Odds are that you do: the middle of September is when most US babies are born, according to data obtained from the National Center for Health Statistics (NCHS) Web site (see Table 1-16). There's an easy way to remember this

Looping is essential to statistical programming. Whether you need to iterate over parameters in an algorithm or indices in an array, a loop is often one of the first programming constructs that a beginning programmer learns. Today is the first anniversary of this blog, which is named The DO Loop,

My elderly mother enjoys playing Scrabble®. The only problem is that my father and most of my siblings won't play with her because she beats them all the time! Consequently, my mother is always excited when I visit because I'll play a few Scrabble games with her. During a recent

I previously showed how to generate random numbers in SAS by using the RAND function in the DATA step or by using the RANDGEN subroutine in SAS/IML software. These functions generate a stream of random numbers. (In statistics, the random numbers are usually a sample from a distribution such as

One of the highly visible changes in SAS 9.3 is the fact that the old LISTING destination is no longer the default destination for ODS output. Instead, the HTML destination is the default. One positive consequence of this is that ODS graphics and tables are interlaced in the output. Another

Exploring correlation between variables is an important part of exploratory data analysis. Before you start to model data, it is a good idea to visualize how variables related to one another. Zach Mayer, on his Modern Toolmaking blog, posted code that shows how to display and visualize correlations in R.

You can generate a set of random numbers in SAS that are uniformly distributed by using the RAND function in the DATA step or by using the RANDGEN subroutine in SAS/IML software. (These same functions also generate random numbers from other common distributions such as binomial and normal.) The syntax

NOTE: SAS stopped shipping the SAS/IML Studio interface in 2018. It is no longer supported, so this article is no longer relevant. When I write SAS/IML programs, I usually do my development in the SAS/IML Studio environment. Why? There are many reasons, but the one that I will discuss today

Some people search the Internet for a set of topics and then use the number of search results ("hits") for each topic to rank the relative popularity of the topics. At the 2011 Joint Statistical Meetings (JSM), I had the opportunity to attend several talks by statisticians from Google and

I've previously described ways to solve systems of linear equations, A*b = c. While discussing the relative merits of the solving a system for a particular right hand side versus solving for the inverse matrix, I made the assertion that it is faster to solve a particular system than it

This article describes the SAS/IML CHOOSE function: how it works, how it doesn't work, and how to use it to make your SAS/IML programs more compact. In particular, the CHOOSE function has a potential "gotcha!" that you need to understand if you want your program to perform as expected. What

When I was at the Joint Statistical Meetings (JSM) last week, a SAS customer asked me whether it was possible to use the SGPLOT procedure to produce side-by-side bar charts. The answer is "yes" in SAS 9.3, thanks to the new GROUPDISPLAY= option on the VBAR and HBAR statements. For

The SAS/IML language provides two functions for solving a nonsingular nxn linear system A*x = c: The INV function numerically computes the inverse matrix, A-1. You can use this to solve for x: Ainv = inv(A); x = Ainv*c;. The SOLVE function numerically computes the particular solution, x, for a

In the SAS/IML language, the index creation operator (:) is used to construct a sequence of integer values. For example, the expression 1:7 creates a row vector with seven elements: 1, 2, ..., 7. It is important to know the precedence of matrix operators. When I was in grade school,

I've previously discussed how to find the root of a univariate function. This article describes how to find the root (zero) of a function of several variables by using Newton's method. There have been many papers, books, and dissertations written on the topic of root-finding, so why am I blogging

At the SAS/IML Support Community, a SAS/IML programmer recently asked how to find "the root of a complicated equation." That's a huge question, and many papers and books have been written on the topic of root-finding, also known as finding the zeros of a function. Everyone has favorite techniques for

A matrix is an array of numbers or character strings. When I print a matrix, I usually want to see only the data. However, sometimes it is helpful to add row or column headings that indicate the names of variables or labels for rows. A simple example is count data

In a previous blog post, I showed how to use the LOGISTIC procedure to construct a receiver operator characteristic (ROC) curve in SAS. That same day, Charlie H. blogged about how to use the DATA step to construct an ROC curve from basic principles. It has been a long time

I've written about how to add a diagonal line to a scatter plot by using the SGPLOT procedure in SAS 9.2. The main idea (use the VECTOR statement) is easy enough, but writing a program that handles a line with any slope requires some additional effort. But now SAS 9.3

The other day I needed to compute the signum function for each element of a matrix. If x is a real number, then the sgn(x) is -1 when x<0, 1 when x>0, and 0 when x=0. I wrote a SAS/IML module that contains a compact little expression: proc iml; start

I recently blogged about how many times, on average, you must roll a die until you see all six faces. This question is a special case of the coupon collector's problem. My son noted that the expected value (the mean number of rolls) is not necessarily the best statistic to

"Dad? How many times do I have to roll a die until all six sides appear?" I stopped what I was doing to consider my son's question. Although I could figure out the answer mathematically, sometimes experiments are more powerful than math equations for showing how probability works. "Why don't

Yesterday, Jiangtang Hu did a frequency analysis of my blog posts and noticed that there are some holidays on which I post to my blog and others on which I do not. The explanation is simple: I post on Mondays, Wednesdays, and Fridays, provided that SAS Institute (World Headquarters) is

Dates and times. As Wayne Finley states in his SUGI25 paper on SAS date and time handling, "The SAS system provides a plethora of methods to handle date and time values." Along with the plethora of methods is a plethora of papers on the topic. If you want to trick