A previous article discusses how to generate a random covariance matrix with a specified set of (positive) eigenvalues. A SAS programmer asked whether it is possible to produce a correlation matrix that has a specified set of eigenvalues. After discussing the problem with a friend, I am happy to report
Tag: Statistical Programming
While researching the topic of Latin hypercube sampling (LHS), I read an article by Emily Gao (2019) that shows how to use PROC IML in SAS to perform the algorithm. It is possible to simplify Gao's implementation of Latin hypercube sampling in SAS while also making the computation more efficient.
The article "Order two-dimensional vectors by using angles" shows how to re-order a set of 2-D vectors by their angles. Because angles are on a circle, which has no beginning and no end, you must specify which vector will appear first in the list. The previous article finds the largest
Order matters. The order of variables in tables and rows of a correlation matrix can make a big difference in how easy it is to observed correlations between variables or groups of variables. There are many ways to order the variables, but this article shows how to display the variables
A previous article discusses the MakeString function, which you can use to convert an IML character vector into a string. This can be very useful. When I originally wrote the MakeString function, I was disappointed that I could not vectorize the computation. Recently, I learned about the COMBL function in
A common way to visualize the sample correlations between many numeric variables is to display a heat map that shows the Pearson correlation for each pair of variables, as shown in the image to the right. The correlation is a number in the range [-1, 1], where -1 indicated perfect
In SAS, DATA step programmers use the IN operator to determine whether a value is contained in a set of target values. Did you know that there is a similar functionality in the SAS IML language? The ELEMENT function in the SAS IML language is similar to the IN operator
A previous article shows how to implement recursive formulas in SAS. The article points out that you can often avoid recursion by using an iterative algorithm, which is more efficient. An example is the Fibonacci sequence, which is usually defined recursively as F(n) = F(n-1) + F(n-2) for n
Many well-known distributions become more and more "normal looking" for large values of a parameter. Famously, the binomial distribution, Binom(p, N), can be approximated by a normal distribution when N (the sample size) is large. Similarly, the Poisson(λ) distribution is well approximated by the normal distribution when λ is large.
There are two programming tools that I rarely use: the SAS macro language and recursion. The SAS macro language is a tool that enables you to generate SAS statements. I rarely use the SAS macro language because the SAS IML language supports all the functionality required to write complex programs,
In SAS, the easiest way to draw random sampling from data is to use PROC SURVEYSELECT or the SAMPLE function in SAS IML software. I have previously written about how to implement four common sampling schemes by using PROC SURVEYSELECT and the SAMPLE function. The DATA step in SAS is
A previous article shows that you can run a simple (one-variable) isotonic regression by using a quadratic programming (QP) formulation. While I was reading a book about computational geometry, I learned that there is a connection between isotonic regression and the convex hull of a certain set of points. Whaaaaat?
Since the pandemic began in 2020, the SAS IML developers have added about 50 new functions and enhancements to the SAS IML language in SAS Viya. Among these functions are new modern methods for optimization that have a simplified syntax as compared to the older 'NLP' functions that are available
One of the most exciting features of SAS Viya Workbench is that the code editor includes a generative AI component called SAS Viya Copilot. This feature was announced and demonstrated at SAS Innovate 2024. With the Copilot, you can specify a text prompt that generates SAS code. For example, you
In a recent article, I graphed the PDF of a few Beta distributions that had a variety of skewness and kurtosis values. I thought that I had chosen the parameter values to represent a wide variety of Beta shapes. However, I was surprised to see that the distributions were all
The moment-ratio diagram is a tool that is useful when choosing a distribution that models a sample of univariate data. As I show in my book (Simulating Data with SAS, Wicklin, 2013), you first plot the skewness and kurtosis of the sample on the moment-ratio diagram to see what common
A SAS programmer wanted to simulate samples from a family of Beta(a,b) distributions for a simulation study. (Recall that a Beta random variable is bounded with values in the range [0,1].) She wanted to choose the parameters such that the skewness and kurtosis of the distributions varied over range of
A recent article describes how to estimate coefficients in a simple linear regression model by using maximum likelihood estimation (MLE). One of the nice properties of an MLE formulation is that you can compare a large model with a nested submodel in a natural way. For example, if you can
A statistical analyst used the GENMOD procedure in SAS to fit a linear regression model. He noticed that the table of parameter estimates has an extra row (labeled "Scale") that is not a regression coefficient. The "scale parameter" is not part of the parameter estimates table produced by PROC REG
In a recent Monte Carlo project, I needed to simulate numbers on an interval by using a continuous linear probability density function (PDF). An example is shown to the right. In this example, the linear density function is decreasing on the interval, but the function could also be constant or
I read a journal article in which a researcher used a formula for the probability density function (PDF) of the sample correlation coefficient. The formula was rather complicated, and presented with no citation, so I was curious to learn more. I found the distribution for the correlation coefficient in the
This article looks at a geometric method for estimating the center of a multivariate point cloud. The method is known as convex-hull peeling. In two-dimensions, you can perform convex-hull peeling in SAS 9 by using the CVEXHULL function in SAS IML software. For higher dimensions, you can use the CONVEXHULL
There are two popular ways to express the steepness of a line or ray. The most-often used mathematical definition is from high-school math where the slope is defined as "rise over run." A second way is to report the angle of inclination to the horizontal, as introduced in basic trigonometry.
In a previous article, I presented some of the most popular blog posts from 2023. The popular articles tend to discuss elementary topics that have broad appeal. However, I also wrote many technical articles about advanced topics. The following articles didn't make the Top 10 list, but they deserve a
In 2023, I wrote 90 articles for The DO Loop blog. My most popular articles were about SAS programming, data visualization, and statistics. In addition, several "general interest" articles were popular, including my article for Pi Day and an article about AI chatbots. If you missed any of these articles,
Statistical software often includes supports for a weight variable. Many SAS procedures make a distinction between integer frequencies and more general "importance weights." Frequencies are supported by using the FREQ statement in SAS procedures; general weights are supported by using the WEIGHT statement. An exception is PROC FREQ, which contains
SAS provides many built-in routines for data analysis. A previous article discusses polychoric correlation, which is a measure of association between two ordinal variables. In SAS, you can use PROC FREQ or PROC CORR to estimate the polychoric correlation, its standard error, and confidence intervals. Although SAS provides a built-in
A previous article discussed how to compute probabilities for the bivariate standard normal distribution. The standard bivariate normal distribution with correlation ρ is denoted BVN(0,ρ). For any point (x,y), you can use the PROBBNRM function in SAS to compute the probability that the random variables (X,Y) ~ BVN(0,ρ) is observed
This article shows how to use SAS to compute the probabilities for two correlated normal variables. Specifically, this article shows how to compute the probabilities for rectangular regions in the plane. A second article discusses the computation over infinite regions such as quadrants. If (X,Y) are random variables that are
The collinearity problem is to determine whether three points in the plane lie along a straight line. You can solve this problem by using middle-school algebra. An algebraic solution requires three steps. First, name the points: p, q, and r. Second, find the parametric equation for the line that passes