M estimation is a robust regression technique that assigns a weight to each observation based on the magnitude of the residual for that observation. Large residuals are downweighted (assigned weights less than 1) whereas observations with small residuals are given weights close to 1. By iterating the reweighting and fitting
Author
An early method for robust regression was iteratively reweighted least-squares regression (Huber, 1964). This is an iterative procedure in which each observation is assigned a weight. Initially, all weights are 1. The method fits a least-squares model to the weighted data and uses the size of the residuals to determine
A common question on SAS discussion forums is how to randomly assign observations to groups. An application of this problem is assigning patients to cohorts in a clinical trial. For example, you might have 137 patients that you want to randomly assign to three groups: a control group, a group
Many modern statistical techniques incorporate randomness: simulation, bootstrapping, random forests, and so forth. To use the technique, you need to specify a seed value, which determines pseudorandom numbers that are used in the algorithm. Consequently, the seed value also determines the results of the algorithm. In theory, if you know
I have previously blogged about ways to perform balanced bootstrap resampling in SAS. I recently learned about an easier way: Since SAS/STAT 14.2 (SAS 9.4M4), the SURVEYSELECT procedure has supported balanced bootstrap sampling. This article reviews balanced bootstrap sampling and shows how to use the METHOD=BALBOOT option in PROC SURVEYSELECT
In categorical data analysis, it is common to analyze tables of counts. For example, a researcher might gather data for 18 boys and 12 girls who apply for a summer enrichment program. The researcher might be interested in whether the proportion of boys that are admitted is different from the
Did you know that there is a mathematical formula that simplifies finding the derivative of a determinant? You can compute the derivative of a determinant of an n x n matrix by using the sum of n other determinants. The n determinants are for matrices that are equal to the original matrix
In The Essential Guide to Bootstrapping in SAS, I note that there are many SAS procedures that support bootstrap estimates without requiring the analyst to write a program. I have previously written about using bootstrap options in the TTEST procedure. This article discusses the NLIN procedure, which can fit nonlinear
Recently, I wrote about Bartlett's test for sphericity. The purpose of this hypothesis test is to determine whether the variables in the data are uncorrelated. It works by testing whether the sample correlation matrix is close to the identity matrix. Often statistics textbooks or articles include a statement such as
When you have many correlated variables, principal component analysis (PCA) is a classical technique to reduce the dimensionality of the problem. The PCA finds a smaller dimensional linear subspace that explains most of the variability in the data. There are many statistical tools that help you decide how many principal
To a numerical analyst, numerical integration has two meanings. Numerical integration often means solving a definite integral such as $$int_{a}^b f(s) , ds$$. Numerical integration is also called quadrature because it computes areas. The other meaning applies to solving ordinary differential equations (ODEs). My article about new methods for solving
Recently, I showed how to use a heat map to visualize measurements over time for a set of patients in a longitudinal study. The visualization is sometimes called a lasagna plot because it presents an alternative to the usual spaghetti plot. A reader asked whether a similar visualization can be
What is McNemar's test? How do you run the McNemar test in SAS? Why might other statistical software report a value for McNemar's test that is different from the SAS value? SAS supports an exact version of the McNemar test, but when should you use it? This article answers these
Some matrices are so special that they have names. The identity matrix is the most famous, but many are named after a researcher who studied them such as the Hadamard, Hilbert, Sylvester, Toeplitz, and Vandermonde matrices. This article is about the Pascal matrix, which is formed by using elements from
Many discussions and articles about SAS Viya emphasize its ability to handle Big Data, perform parallel processing, and integrate open-source languages. These are important issues for some SAS customers. But for customers who program in SAS and do not have Big Data, SAS Viya is attractive because it is the
The graph to the right is the quantile function for the standard normal distribution, which is sometimes called the probit function. Given any probability, p, the quantile function gives the value, x, such that the area under the normal density curve to the left of x is exactly p. This
Oh, no! Your boss just told you to change the way that SAS displays certain features in graphs, such as missing values. But you have a library of hundreds of SAS programs! Do you need to modify all of your previous programs? Fortunately, the answer is no. SAS provides ODS
In an article about how to visualize missing data in a heat map, I noted that the SAS SG procedures (such as PROC SGPLOT) use the GraphMissing style element to color a bar or tile that represents a missing value. In the HTMLBlue ODS style, the color for missing values
Longitudinal data are measurements for a set of subjects at multiple points in time. Also called "panel data" or "repeated measures data," this kind of data is common in clinical trials in which patients are tracked over time. Recently, a SAS programmer asked how to visualize missing values in a
This article shows how to compute properties of a discrete probability distribution from basic definitions. You can use the definitions to compute the mean, variance, and median of a discrete probability distribution when there is no simple formula for those quantities. This article is motivated by two computational questions about
Statistical programmers need to access numerical constants that help us to write robust and accurate programs. Specifically, it is necessary to know when it is safe to perform numerical operations such as raising a number to a power without exceeding the largest number that is representable in finite-precision arithmetic. This
A previous article showed how to use SAS to compute finite-difference derivatives of smooth vector-valued multivariate functions. The article uses the NLPFDD subroutine in SAS/IML to compute the finite-difference derivatives. The article states that the third output argument of the NLPFDD subroutine "contains the matrix product J`*J, where J is
On this Pi Day, let's explore the "πth roots of unity." (Pi Day is celebrated in the US on 3/14 to celebrate π ≈ 3.14159....) It's okay if you've never heard of the πth roots of unity. This article starts by reviewing the better-known nth roots of unity. It then
Did you know that you can use π to partition the positive integers into two disjoint groups? It's not hard. One group is generated by the integer portions of multiples of π. The FLOOR function gives the integer portion of a positive number, so you can write integer that are
I previously showed how to use SAS to compute finite-difference derivatives for smooth scalar-valued functions of several variables. You can use the NLPFDD subroutine in SAS/IML software to approximate the gradient vector (first derivatives) and the Hessian matrix (second derivatives). The computation uses finite-difference derivatives to approximate the derivatives. The
Many applications in mathematics and statistics require the numerical computation of the derivatives of smooth multivariate functions. For simple algebraic and trigonometric functions, you often can write down expressions for the first and second partial derivatives. However, for complicated functions, the formulas can get unwieldy (and some applications do not
An experienced SAS programmer recently switched to SAS Viya and asked how to discover what products are available on his version of Viya. We discussed a few older SAS 9 procedures, and I showed him a new Viya-specific way to get information about his version of SAS and his licensed
It is important to be able to detect whether a numerical matrix is symmetric. Some operations in linear algebra require symmetric matrices. Sometimes, you can use special algorithms to factor a symmetric matrix. In both cases, you need to test a matrix for symmetry. A symmetric matrix must be square.
A SAS programmer asked an interesting question: If data in a time series has missing values, can you plot a dashed line to indicate that the response is missing at some times? A simple way to achieve this is by overlaying two lines. The first line (the "bottom" line in
This article implements Passing-Bablok regression in SAS. Passing-Bablok regression is a one-variable regression technique that is used to compare measurements from different instruments or medical devices. The measurements of the two variables (X and Y) are both measured with errors. Consequently, you cannot use ordinary linear regression, which assumes that