A common question from statistical programmers is how to compute the rank of a matrix in SAS. Recall that the rank of a matrix is defined as the number of linearly independent columns in the matrix. (Equivalently, the number of linearly independent rows.) This article describes how to compute the
Tag: Numerical Analysis
In my previous post, I showed how to approximate a cumulative density function (CDF) by evaluating only the probability density function. The technique uses the trapezoidal rule of integration to approximate the CDF from the PDF. For common probability distributions, you can use the CDF function in Base SAS to
Evaluating a cumulative distribution function (CDF) can be an expensive operation. Each time you evaluate the CDF for a continuous probability distribution, the software has to perform a numerical integration. (Recall that the CDF at a point x is the integral under the probability density function (PDF) where x is
One of the things I enjoy about blogging is that I often learn something new. Last week I wrote about how to optimize a function that is defined in terms of an integral. While developing the program in the article, I made some mistakes that generated SAS/IML error messages. By
The SAS/IML language is used for many kinds of computations, but three important numerical tasks are integration, optimization, and root finding. Recently a SAS customer asked for help with a problem that involved all three tasks. The customer had an objective function that was defined in terms of an integral.
In SAS software, you can use the QUAD subroutine in the SAS/IML language to evaluate definite integrals on an interval [a, b]. The integral is properly defined only for a < b, but mathematicians define the following convention, which enables you to make sense of reversing the limits of integration:
Last week I described the Hilbert matrix of size n, which is a famous square matrix in numerical linear algebra. It is famous partially because its inverse and its determinant have explicit formulas (that is, we know them exactly), but mainly because the matrix is ill-conditioned for moderate values of
Did you know that SAS/IML 12.1 provides built-in functions that compute the norm of a vector or matrix? A vector norm enables you to compute the length of a vector or the distance between two vectors in SAS. Matrix norms are used in numerical linear algebra to estimate the condition
Last week I showed how to find parameters that maximize the integral of a certain probability density function (PDF). Because the function was a PDF, I could evaluate the integral by calling the CDF function in SAS. (Recall that the cumulative distribution function (CDF) is the integral of a PDF.)
SAS programmers use the SAS/IML language for many different tasks. One important task is computing an integral. Another is optimizing functions, such as maximizing a likelihood function to find parameters that best fit a set of data. Last week I saw an interesting problem that combines these two important tasks.
One of my favorite new features of SAS/IML 12.1 enables you to define functions that contain default values for parameters. This is extremely useful when you want to write a function that has optional arguments. Example: Centering a data vector It is simple to specify a SAS/IML module with a
Finding the root (or zero) of a function is an important computational task because it enables you to solve nonlinear equations. I have previously blogged about using Newton's method to find a root for a function of several variables. I have also blogged about how to use the bisection method
While sorting through an old pile of papers, I discovered notes from a 2012 SAS conference that I had attended. Next to the abstract for one presentation, I had scrawled a note to myself that read "BLOG about the incomplete beta function!" Okay, Rick, whatever you say! In statistics, the
This is the last post in my recent series of articles on computing contours in SAS. Last month a SAS customer asked how to compute the contours of the bivariate normal cumulative distribution function (CDF). Answering that question in a single blog post would have resulted in a long article,
I'm spoiled by the internet. I've grown so accustomed to being able to instantly find an answer to any query—no matter how obscure—that I am surprised when I don't find what I am looking for. The other day I was trying to find a mathematical result: a formula for the
Like many other computer packages, SAS can produce a contour plot that shows the level sets of a function of two variables. For example, I've previously written blogs that use contour plots to visualize the bivariate normal density function and to visualize the cumulative normal distribution function. However, sometimes you
In a previous post, I showed how to solve differential equations in SAS by using the ODE subroutine in the SAS/IML language, which solves initial value problems. This article describes how to draw phase portraits for two classic differential equations: the equations of motion for the simple harmonic oscillator and
Differential equations arise in the modeling of many physical processes, including mechanical and chemical systems. You can solve systems of first-order ordinary differential equations (ODEs) by using the ODE subroutine in the SAS/IML language, which solves initial value problems. This article uses the equations of motion for the classic simple
Finding the maximum value of a function is an important task in statistics. There are three approaches to finding a maxima: When the function is available as an analytic expression, you can use an optimization algorithm to find the maxima. For example, in the SAS/IML language, you can use any
The truncated normal distribution TN(μ, σ, a, b) is the distribution of a normal random variable with mean μ and standard deviation σ that is truncated on the interval [a, b]. I previously blogged about how to implement the truncated normal distribution in SAS. A friend wanted to simulate data
As I wrote in my previous post, a SAS customer noticed that he was getting some duplicate values when he used the RAND function to generate a large number of random uniform values on the interval [0,1]. He wanted to know if this result indicates a bug in the RAND
The determinant of a matrix arises in many statistical computations, such as in estimating parameters that fit a distribution to multivariate data. For example, if you are using a log-likelihood function to fit a multivariate normal distribution, the formula for the log-likelihood involves the expression log(det(Σ)), where Σ is the
John D. Cook posted a story about Hardy, Ramanujan, and Euler and discusses a conjecture in number theory from 1937. Cook says, Euler discovered 635,318,657 = 158^4 + 59^4 = 134^4 + 133^4 and that this was the smallest [integer]known to be the sum of two fourth powers in two
When you are working with probability distributions (normal, Poisson, exponential, and so forth), there are four essential functions that a statistical programmer needs. As I've written before, for common univariate distributions, SAS provides the following functions: the PDF function, which returns the probability density at a given point the CDF
A collegue who works with time series sent me the following code snippet. He said that the calculation was overflowing and wanted to know if this was a bug in SAS: data A(drop=m); call streaminit(12345); m = 2; x = 0; do i = 1 to 5000; x = m*x
I've been a fan of statistical simulation and other kinds of computer experimentation for many years. For me, simulation is a good way to understand how the world of statistics works, and to formulate and test conjectures. Last week, while investigating the efficiency of the power method for finding dominant
When I was at SAS Global Forum last week, a SAS user asked my advice regarding a SAS/IML program that he wrote. One step of the program was taking too long to run and he wondered if I could suggest a way to speed it up. The long-running step was
To a statistician, the DIF function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function has many other uses, including computing finite differences. The DIF function computes the difference between the original vector and a shifted version
To a statistician, the LAG function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function provides a convenient way to compute quantitites that involve adjacent values in any vector. The LAG function is essentially a "shift operator."
In a previous post I showed how to implement Stewart's (1980) algorithm for generating random orthogonal matrices in SAS/IML software. By using the algorithm, it is easy to generate a random matrix that contains a specified set of eigenvalues. If D = diag(λ1, ..., λp) is a diagonal matrix and