What is McNemar's test? How do you run the McNemar test in SAS? Why might other statistical software report a value for McNemar's test that is different from the SAS value? SAS supports an exact version of the McNemar test, but when should you use it? This article answers these
Tag: Statistical Programming
This article shows how to compute properties of a discrete probability distribution from basic definitions. You can use the definitions to compute the mean, variance, and median of a discrete probability distribution when there is no simple formula for those quantities. This article is motivated by two computational questions about
Statistical programmers need to access numerical constants that help us to write robust and accurate programs. Specifically, it is necessary to know when it is safe to perform numerical operations such as raising a number to a power without exceeding the largest number that is representable in finite-precision arithmetic. This
A previous article showed how to use SAS to compute finite-difference derivatives of smooth vector-valued multivariate functions. The article uses the NLPFDD subroutine in SAS/IML to compute the finite-difference derivatives. The article states that the third output argument of the NLPFDD subroutine "contains the matrix product J`*J, where J is
I previously showed how to use SAS to compute finite-difference derivatives for smooth scalar-valued functions of several variables. You can use the NLPFDD subroutine in SAS/IML software to approximate the gradient vector (first derivatives) and the Hessian matrix (second derivatives). The computation uses finite-difference derivatives to approximate the derivatives. The
Many applications in mathematics and statistics require the numerical computation of the derivatives of smooth multivariate functions. For simple algebraic and trigonometric functions, you often can write down expressions for the first and second partial derivatives. However, for complicated functions, the formulas can get unwieldy (and some applications do not
This article implements Passing-Bablok regression in SAS. Passing-Bablok regression is a one-variable regression technique that is used to compare measurements from different instruments or medical devices. The measurements of the two variables (X and Y) are both measured with errors. Consequently, you cannot use ordinary linear regression, which assumes that
For some reason, SAS programmers like to express their love by writing SAS programs. Since Valentine's Day is next week, I thought I would add another SAS graphic to the collection of ways to use SAS to express your love. Last week, I showed how to use vector operation and
I recently showed how to find the intersection between a line and a circle. While working on the problem, I was reminded of a fun mathematical game. Suppose you make a billiard table in the shape of a circle or an ellipse. What is the path for a ball at
Recently, I needed to implement a line search algorithm in SAS. The line search is illustrated by the figure at the right. You start with a point, p, in d-dimensional space and a direction vector, v. (In the figure, d=2, but in general d > 1.) The goal is to
Recently, a SAS programmer commented about one of my blog posts. He said that he had found an alternative answer on another website. Whereas my answer was formulated in terms of the normal cumulative distribution function (CDF), the other answer used the ERF function. This article shows the relationship between
On this blog, I write about a diverse set of topics that are relevant to statistical programming and data visualization. In a previous article, I presented some of the most popular blog posts from 2021. The most popular articles often deal with elementary or familiar topics that are useful to
Last year, I wrote almost 100 posts for The DO Loop blog. My most popular articles were about data visualization, statistics and data analysis, and simulation and bootstrapping. If you missed any of these gems when they were first published, here are some of the most popular articles from 2021:
In a previous article, I visualized seven Christmas-themed palettes of colors, as shown to the right. You can see that the palettes include many red, green, and golden colors. Clearly, the colors in the Christmas palettes are not a random sample from the space of RGB colors. Rather, they represent
While discussing how to compute convex hulls in SAS with a colleague, we wondered how the size of the convex hull compares to the size of the sample. For most distributions of points, I claimed that the size of the convex hull is much less than the size of the
I attended a seminar last week whose purpose was to inform SAS 9 programmers about SAS Viya. I could tell from the programmer's questions that some programmers were confused about three basic topics: What are the computing environments in Viya, and how should a programmer think about them? What procedures
In a previous article, I showed how to use theCVEXHULL function in SAS/IML to compute the convex hull of a finite set of planar points. The convex hull is a convex polygon, which is defined by its vertices. To visualize the polygon, you need to know the vertices in sequential
Given a cloud of points in the plane, it can be useful to identify the convex hull of the points. The convex hull is the smallest convex set that contains the observations. For a finite set of points, it is a convex polygon that has some of the points as
A previous article discusses how to use SAS regression procedures to fit a two-parameter Weibull distribution in SAS. The article shows how to convert the regression output into the more familiar scale and shape parameters for the Weibull probability distribution, which are fit by using PROC UNIVARIATE. Although PROC UNIVARIATE
This article uses an example to introduce to genetic algorithms (GAs) for optimization. It discusses two operators (mutation and crossover) that are important in implementing a genetic algorithm. It discusses choices that you must make when you implement these operations. Some programmers love using genetic algorithms. Genetic algorithms are heuristic
The partition problem has many variations, but recently I encountered it as an interactive puzzle on a computer. (Try a similar game yourself!) The player is presented with an old-fashioned pan-balance scale and a set of objects of different weights. The challenge is to divide (or partition) the objects into
You can use the bootstrap method to estimate confidence intervals. Unlike formulas, which assume that the data are drawn from a specified distribution (usually the normal distribution), the bootstrap method does not assume a distribution for the data. There are many articles about how to use SAS to bootstrap statistics
For graphing multivariate data, it is important to be able to convert the data between "wide form" (a separate column for each variable) and "long form" (which contains an indicator variable that assigns a group to each observation). If the data are numeric, the wide data can be represented as
One of the benefits of using the SWEEP operator is that it enables you to "sweep in" columns (add effects to a model) in any order. This article shows that if you use the SWEEP operator, you can compute a SSCP matrix and use it repeatedly to estimate any linear
Do you ever use a permutation matrix to change the order of rows or columns in a matrix? Did you know that there is a more efficient way in matrix-oriented languages such as SAS/IML, MATLAB, and R? Remember the following tip: Never multiply with a large permutation matrix! Instead, use
A SAS programmer recently asked why his SAS program and his colleague's R program display different estimates for the quantiles of a very small data set (less than 10 observations). I pointed the programmer to my article that compares the nine common definitions for sample quantiles. The article has a
To get better at something, you need to practice. That maxim applies to sports, music, and programming. If you want to be a better programmer, you need to write many programs. This article provides an example of forming the intersection of items in a SAS/IML list. It then provides several
After my recent articles on simulating data by using copulas, many readers commented about the power of copulas. Yes, they are powerful, and the geometry of copulas is beautiful. However, it is important to be aware of the limitations of copulas. This article creates a bizarre example of bivariate data,
Do you know what a copula is? It is a popular way to simulate multivariate correlated data. The literature for copulas is mathematically formidable, but this article provides an intuitive introduction to copulas by describing the geometry of the transformations that are involved in the simulation process. Although there are
A recent article about how to estimate a two-dimensional distribution function in SAS inspired me to think about a related computation: a 2-D cumulative sum. Suppose you have numbers in a matrix, X. A 2-D cumulative sum is a second matrix, C, such that the C[p,q] gives the sum of