This article is about how to use Git to share SAS programs, specifically how to share libraries of SAS IML functions. Some IML programmers might remember an earlier way to share libraries of functions: SAS/IML released "packages" in SAS 9.4m3 (2015), which enable you to create, document, share, and use
Author
SAS supports the ColorBrewer system of color palettes from the ColorBrewer website (Brewer and Harrower, 2002). The ColorBrewer color ramps are available in SAS by using the PALETTE function in SAS IML software. The PALETTE function supports all ColorBrewer palettes, but some palettes are not interpretable by people with color
Did you know that about 8% of the world's men are colorblind? (More correctly, 8% of men are "color vision deficient," since they see colors, but not all colors.) Because of the "birthday paradox," in a room that contains eight men, the probability is 50% that at least one is
I previously discussed how to use the PUTLOG statement to write a message from the DATA step to the log in SAS. The PUTLOG statement is commonly used to write notes, warnings, and errors to the log. This article shows how to use the PRINTTOLOG subroutine in SAS IML software
Many experienced SAS programmers use the PUT statement to write messages to the log from a DATA step. But did you know that SAS supports the PUTLOG function, which is another way to write a message to the log? I use the PUTLOG statement in the DATA step for the
A previous article shows that you can use the Intercept parameter to control the ratio of events to nonevents in a simulation of data from a logistic regression model. If you decrease the intercept parameter, the probability of the event decreases; if you increase the intercept parameter, the probability of
This article shows that you can use the intercept parameter to control the probability of the event in a simulation study that involves a binary logistic regression model. For simplicity, I will simulate data from a logistic regression model that involves only one explanatory variable, but the main idea applies
In a previous article, I presented some of the most popular blog posts from 2022. In general, popular articles deal with elementary topics that have broad appeal. However, I also write articles about advanced topics. The following articles didn't make the Top 10 list, but they deserve a second look.
Since 2008, SAS has supported an interface for calling R from the SAS/IML matrix language. Many years ago, I wrote blog posts that describe how to call R from PROC IML. For SAS 9.4, the process of installing R and calling R from PROC IML is documented in the SAS/IML
Last year, I wrote almost 90 articles for The DO Loop blog. My most popular articles were about SAS programming, data visualization, statistics and data analysis, and matrix computations. If you missed these articles when I published them—or if you want to read them again!— here is the "Reader's Choice
A colleague posted a Christmas-themed code snippet that shows how to use the DATA step in SAS to output all the possible ways that Santa can hitch up a team of reindeer to pull his sled. The assumption is that Rudolph must lead the team, and the remaining reindeer are
A previous article describes how to use SAS IML software to construct common covariance structures that are encountered in mixed models. Each covariance matrix has several parameters, and you want to construct a matrix for any choice of the parameters. After you have constructed the covariance matrix, you can use
I always emphasize efficiency in statistical programming. I have previously written about why you should never multiply with a large diagonal matrix in the SAS IML language. The reason is that it is more efficient to use elementwise multiplication than matrix multiplication. Specifically, if d is a column vector, then
For Christmas 2021, I wrote an article about palettes of Christmas colors, chiefly shades of red, green, silver, and gold. One of my readers joked that she would like to use my custom palette to design her own Christmas wrapping paper! I remembered her jest when I saw some artwork
A probabilistic card trick is a trick that succeeds with high probability and does not require any skill from the person performing the trick. I have seen a certain trick mentioned several times on social media. I call it "ladders" or the "ladders game" because it reminds me of the
A SAS programmer was trying to simulate poker hands. He was having difficulty because the sampling scheme for simulating card games requires that you sample without replacement for each hand. In statistics, this is called "simple random sampling." If done properly, it is straightforward to simulate poker hands in SAS.
Recently, I needed to know "how much" of a piecewise linear curve is below the X axis. The coordinates of the curve were given as a set of ordered pairs (x1,y1), (x2,y2), ..., (xn, yn). The question is vague, so the first step is to define the question better. Should
A profile plot is a way to display multivariate values for many subjects. The optimal linear profile plot was introduced by John Hartigan in his book Clustering Algorithms (1975). In Michael Friendly's book (SAS System for Statistical Graphics, 1991), Friendly shows how to construct an optimal linear profile by using
A profile plot is a compact way to visualize many variables for a set of subjects. It enables you to investigate which subjects are similar to or different from other subjects. Visually, a profile plot can take many forms. This article shows several profile plots: a line plot of the
I recently blogged about how to compute the area of the convex hull of a set of planar points. This article discusses the expected value of the area of the convex hull for n random uniform points in the unit square. The article introduces an exact formula (due to Buchta,
The area of a convex hull enables you to estimate the area of a compact region from a set of discrete observations. For example, a biologist might have multiple sightings of a wolf pack and want to use the convex hull to estimate the area of the wolves' territory. A
Every year, I write a special article for Halloween in which I show a SAS programming TRICK that is a real TREAT! This year, the trick is to concatenate two strings into a single string in a way that guarantees you can always recover the original strings. I learned this
A SAS programmer asked how to create a graph that shows whether missing values in one variable are associated with certain values of another variable. For example, a patient who is supposed to monitor his blood glucose daily might have more missing measurements near holidays and in the summer months
I recently gave a presentation about the SAS/IML matrix language in which I emphasized that a matrix language enables you to write complex analyses by using only a few lines of code. In the presentation, I used least squares regression as an example. One participant asked how many additional lines
Recently, I needed to write a program that can provide a solution to a regression-type problem, even when the data are degenerate. Mathematically, the problem is an overdetermined linear system of equations X*b = y, where X is an n x p design matrix and y is an n x 1 vector. For most
On a SAS discussion forum, a statistical programmer asked about how to understand the statistics that are displayed when you use the TEST statement in PROC REG (or other SAS regression procedures) to test for linear relationships between regression coefficients. The documentation for the TEST statement in PROC REG explains
One of the benefits of social media is the opportunity to learn new things. Recently, I saw a post on Twitter that intrigued me. The tweet said that the expected volume of a random tetrahedron in the unit cube (in 3-D) is E[Volume] = 0.0138427757.... This number seems surprisingly small!
Have you ever typed your credit card into an online order form and been told that you entered the wrong number? Perhaps you wondered, "How do they know that the numbers I typed do not make a valid credit card number?" The answer is that credit card numbers and other
A previous article discusses the definitions of three kinds of moments for a continuous probability distribution: raw moments, central moments, and standardized moments. These are defined in terms of integrals over the support of the distribution. Moments are connected to the familiar shape features of a distribution: the mean, variance,
The moments of a continuous probability distribution are often used to describe the shape of the probability density function (PDF). The first four moments (if they exist) are well known because they correspond to familiar descriptive statistics: The first raw moment is the mean of a distribution. For a random