Data analysts often fit a probability distribution to data. When you have access to the data, a common technique is to use maximum likelihood estimation (MLE) to compute the parameters of a distribution that are "most likely" to have produced the observed data. However, how can you fit a distribution

## Tag: **Numerical Analysis**

This article shows how to use SAS to solve a system of nonlinear equations. When there are n unknowns and n equations, this problem is equivalent to finding a multivariate root of a vector-valued function F(x) = 0 because you can always write the system as f1(x1, x2, ..., xn)

Your statistical software probably provides a function that computes quantiles of common probability distributions such as the normal, exponential, and beta distributions. Because there are infinitely many probability distributions, you might encounter a distribution for which a built-in quantile function is not implemented. No problem! This article shows how to

This article shows how to simulate beta-binomial data in SAS and how to compute the density function (PDF). The beta-binomial distribution is a discrete compound distribution. The "binomial" part of the name means that the discrete random variable X follows a binomial distribution with parameters N (number of trials) and

Did you know that you can get SAS to compute symbolic (analytical) derivatives of simple functions, including applying the product rule, quotient rule, and chain rule? SAS can form the symbolic derivatives of single-variable functions and partial derivatives of multivariable functions. Furthermore, the derivatives are output in a form that

The singular value decomposition (SVD) could be called the "billion-dollar algorithm" since it provides the mathematical basis for many modern algorithms in data science, including text mining, recommender systems (think Netflix and Amazon), image processing, and classification problems. Although the SVD was mathematically discovered in the late 1800s, computers have

All statisticians are familiar with the classical arithmetic mean. Some statisticians are also familiar with the geometric mean. Whereas the arithmetic mean of n numbers is the sum divided by n, the geometric mean of n nonnegative numbers is the n_th root of the product of the numbers. The geometric

A SAS customer asked, "I computed the eigenvectors of a matrix in SAS and in another software package. I got different answers? How do I know which answer is correct?" I've been asked variations of this question dozens of times. The answer is usually "both answers are correct." The mathematical

Monte Carlo techniques have many applications, but a primary application is to approximate the probability that some event occurs. The idea is to simulate data from the population and count the proportion of times that the event occurs in the simulated data. For continuous univariate distributions, the probability of an

At a conference last week, a presenter showed SAS statements that compute the logarithm of a probability density function (PDF). The log-PDF is a a common computation because it occurs when maximizing the log-likelihood function. The presenter computed the expression in SAS by using an expression that looked like y

This article describes how you can evaluate the Lambert W function in SAS/IML software. The Lambert W function is defined implicitly: given a real value x, the function's value w = W(x) is the value of w that satisfies the equation w exp(w) = x. Thus W is the inverse

Edmond Halley (1656-1742) is best known for computing the orbit and predicting the return of the short-period comet that bears his name. However, like many scientists of his era, he was involved in a variety of mathematical and scientific activities. One of his mathematical contributions is a numerical method for

I was eleven years old when I first saw Newton's method. No, I didn't go to a school for geniuses. I didn't even know it was Newton's method until decades later. However, in sixth grade I learned an iterative algorithm that taught me (almost) everything I need to know about

Statistical programmers often need to evaluate complicated expressions that contain square roots, logarithms, and other functions whose domain is restricted. Similarly, you might need to evaluate a rational expression in which the denominator of the expression can be zero. In these cases, it is important to avoid evaluating a function

In my article about finding an initial guess for root-finding algorithms, I stated that Newton's root-finding method "might not converge or might converge to a root that is far away from the root that you wanted to find." A reader wanted more information about that statement. I have previously shown

A SAS programmer asked an interesting question on a SAS Support Community. The programmer had a nonlinear function with 12 parameters. He also had file that contained 4,000 lines, where each line contained values for the 12 parameters. In other words, the file specified 4,000 different functions. The programmer wanted

A common question from statistical programmers is how to compute the rank of a matrix in SAS. Recall that the rank of a matrix is defined as the number of linearly independent columns in the matrix. (Equivalently, the number of linearly independent rows.) This article describes how to compute the

In my previous post, I showed how to approximate a cumulative density function (CDF) by evaluating only the probability density function. The technique uses the trapezoidal rule of integration to approximate the CDF from the PDF. For common probability distributions, you can use the CDF function in Base SAS to

Evaluating a cumulative distribution function (CDF) can be an expensive operation. Each time you evaluate the CDF for a continuous probability distribution, the software has to perform a numerical integration. (Recall that the CDF at a point x is the integral under the probability density function (PDF) where x is

One of the things I enjoy about blogging is that I often learn something new. Last week I wrote about how to optimize a function that is defined in terms of an integral. While developing the program in the article, I made some mistakes that generated SAS/IML error messages. By

The SAS/IML language is used for many kinds of computations, but three important numerical tasks are integration, optimization, and root finding. Recently a SAS customer asked for help with a problem that involved all three tasks. The customer had an objective function that was defined in terms of an integral.

In SAS software, you can use the QUAD subroutine in the SAS/IML language to evaluate definite integrals on an interval [a, b]. The integral is properly defined only for a < b, but mathematicians define the following convention, which enables you to make sense of reversing the limits of integration:

Last week I described the Hilbert matrix of size n, which is a famous square matrix in numerical linear algebra. It is famous partially because its inverse and its determinant have explicit formulas (that is, we know them exactly), but mainly because the matrix is ill-conditioned for moderate values of

Did you know that SAS/IML 12.1 provides built-in functions that compute the norm of a vector or matrix? A vector norm enables you to compute the length of a vector or the distance between two vectors in SAS. Matrix norms are used in numerical linear algebra to estimate the condition

Last week I showed how to find parameters that maximize the integral of a certain probability density function (PDF). Because the function was a PDF, I could evaluate the integral by calling the CDF function in SAS. (Recall that the cumulative distribution function (CDF) is the integral of a PDF.)

SAS programmers use the SAS/IML language for many different tasks. One important task is computing an integral. Another is optimizing functions, such as maximizing a likelihood function to find parameters that best fit a set of data. Last week I saw an interesting problem that combines these two important tasks.

One of my favorite new features of SAS/IML 12.1 enables you to define functions that contain default values for parameters. This is extremely useful when you want to write a function that has optional arguments. Example: Centering a data vector It is simple to specify a SAS/IML module with a

Finding the root (or zero) of a function is an important computational task because it enables you to solve nonlinear equations. I have previously blogged about using Newton's method to find a root for a function of several variables. I have also blogged about how to use the bisection method

While sorting through an old pile of papers, I discovered notes from a 2012 SAS conference that I had attended. Next to the abstract for one presentation, I had scrawled a note to myself that read "BLOG about the incomplete beta function!" Okay, Rick, whatever you say! In statistics, the

This is the last post in my recent series of articles on computing contours in SAS. Last month a SAS customer asked how to compute the contours of the bivariate normal cumulative distribution function (CDF). Answering that question in a single blog post would have resulted in a long article,