Longitudinal data are measurements for a set of subjects at multiple points in time. Also called "panel data" or "repeated measures data," this kind of data is common in clinical trials in which patients are tracked over time. Recently, a SAS programmer asked how to visualize missing values in a

# Author

This article shows how to compute properties of a discrete probability distribution from basic definitions. You can use the definitions to compute the mean, variance, and median of a discrete probability distribution when there is no simple formula for those quantities. This article is motivated by two computational questions about

Statistical programmers need to access numerical constants that help us to write robust and accurate programs. Specifically, it is necessary to know when it is safe to perform numerical operations such as raising a number to a power without exceeding the largest number that is representable in finite-precision arithmetic. This

A previous article showed how to use SAS to compute finite-difference derivatives of smooth vector-valued multivariate functions. The article uses the NLPFDD subroutine in SAS/IML to compute the finite-difference derivatives. The article states that the third output argument of the NLPFDD subroutine "contains the matrix product J`*J, where J is

On this Pi Day, let's explore the "πth roots of unity." (Pi Day is celebrated in the US on 3/14 to celebrate π ≈ 3.14159....) It's okay if you've never heard of the πth roots of unity. This article starts by reviewing the better-known nth roots of unity. It then

Did you know that you can use π to partition the positive integers into two disjoint groups? It's not hard. One group is generated by the integer portions of multiples of π. The FLOOR function gives the integer portion of a positive number, so you can write integer that are

I previously showed how to use SAS to compute finite-difference derivatives for smooth scalar-valued functions of several variables. You can use the NLPFDD subroutine in SAS/IML software to approximate the gradient vector (first derivatives) and the Hessian matrix (second derivatives). The computation uses finite-difference derivatives to approximate the derivatives. The

Many applications in mathematics and statistics require the numerical computation of the derivatives of smooth multivariate functions. For simple algebraic and trigonometric functions, you often can write down expressions for the first and second partial derivatives. However, for complicated functions, the formulas can get unwieldy (and some applications do not

An experienced SAS programmer recently switched to SAS Viya and asked how to discover what products are available on his version of Viya. We discussed a few older SAS 9 procedures, and I showed him a new Viya-specific way to get information about his version of SAS and his licensed

It is important to be able to detect whether a numerical matrix is symmetric. Some operations in linear algebra require symmetric matrices. Sometimes, you can use special algorithms to factor a symmetric matrix. In both cases, you need to test a matrix for symmetry. A symmetric matrix must be square.

A SAS programmer asked an interesting question: If data in a time series has missing values, can you plot a dashed line to indicate that the response is missing at some times? A simple way to achieve this is by overlaying two lines. The first line (the "bottom" line in

This article implements Passing-Bablok regression in SAS. Passing-Bablok regression is a one-variable regression technique that is used to compare measurements from different instruments or medical devices. The measurements of the two variables (X and Y) are both measured with errors. Consequently, you cannot use ordinary linear regression, which assumes that

For some reason, SAS programmers like to express their love by writing SAS programs. Since Valentine's Day is next week, I thought I would add another SAS graphic to the collection of ways to use SAS to express your love. Last week, I showed how to use vector operation and

I recently showed how to find the intersection between a line and a circle. While working on the problem, I was reminded of a fun mathematical game. Suppose you make a billiard table in the shape of a circle or an ellipse. What is the path for a ball at

Recently, I needed to implement a line search algorithm in SAS. The line search is illustrated by the figure at the right. You start with a point, p, in d-dimensional space and a direction vector, v. (In the figure, d=2, but in general d > 1.) The goal is to

Recently, a SAS programmer commented about one of my blog posts. He said that he had found an alternative answer on another website. Whereas my answer was formulated in terms of the normal cumulative distribution function (CDF), the other answer used the ERF function. This article shows the relationship between

Sometimes it is useful to know the extreme values in data. You might need to know the Top 5 or the Top 10 smallest data values. Or, the Top 5 or Top 10 largest data values. There are many ways to do this in SAS, but this article shows examples

How can you estimate percentiles in SAS Viya? This article shows how to call the percentile action from PROC CAS to estimate percentiles of variables in a CAS data table. Percentiles and quantiles are essentially the same (the pth quantile is the 100*pth percentile for p in [0, 1]), so

Here's a fun problem to think about: Suppose that you have two different valid ways to test a statistical hypothesis. For a given sample, will both tests reject or fail to reject the hypothesis? Or might one test reject it whereas the other does not? The answer is that two

Several probability distributions model the outcomes of various trials when the probabilities of certain events are given. For some distributions, the definitions make sense even when a probability is 0. For other distributions, the definitions do not make sense unless all probabilities are strictly positive. This article examines how zero

Some colors have names, such as "Red," "Magenta," and "Dark Olive Green." But the most common way to specify a color is to use a hexadecimal value such as CX556B2F. It is not obvious that "Dark Olive Green" and CX556B2F represent the same color, but they do! I like to

On this blog, I write about a diverse set of topics that are relevant to statistical programming and data visualization. In a previous article, I presented some of the most popular blog posts from 2021. The most popular articles often deal with elementary or familiar topics that are useful to

You can use the Cholesky decomposition of a covariance matrix to simulate data from a correlated multivariate normal distribution. This method is encapsulated in the RANDNORMAL function in SAS/IML software, but you can also perform the computations manually by calling the ROOT function to get the Cholesky root and then

*The DO Loop*in 2021

Last year, I wrote almost 100 posts for The DO Loop blog. My most popular articles were about data visualization, statistics and data analysis, and simulation and bootstrapping. If you missed any of these gems when they were first published, here are some of the most popular articles from 2021:

Suppose you are creating a craft project for the Christmas holidays, and you want to choose a palette of Christmas colors to give it a cheery holiday appearance. You could use one of the many online collections of color palettes to choose a Christmas-themed palette. However, I didn't want to

In a previous article, I visualized seven Christmas-themed palettes of colors, as shown to the right. You can see that the palettes include many red, green, and golden colors. Clearly, the colors in the Christmas palettes are not a random sample from the space of RGB colors. Rather, they represent

In data visualization, colors can represent the values of a variable in a choropleth map, a heatmap, or a scatter plot. But how do you visualize a palette of colors from the RGB or hexadecimal values of the colors? One way is to use the HEATMAPDISC subroutine in SAS/IML, which

While discussing how to compute convex hulls in SAS with a colleague, we wondered how the size of the convex hull compares to the size of the sample. For most distributions of points, I claimed that the size of the convex hull is much less than the size of the

Did you know that the loess regression algorithm is not well-defined when you have repeated values among the explanatory variables, and you request a very small smoothing parameter? This is because loess regression at the point x0 is based on using the k nearest neighbors to x0. If x0 has

When SAS 9 programmers transition to SAS Viya, there are inevitably questions about how new concepts in Cloud Analytic Services (CAS) relate to similar concepts in SAS. This article discusses the question, "What is the difference between a libref and a caslib?" Both are used to access data, but they