SAS supports many ways to compute the rank of a numeric variable and to handle tied values. However, sometimes I need to rank the values in a character categorical variable. For example, the values {"Male", "Female", "Male"} have ranks {2, 1, 2} because, in alphabetical order, "Female" is the first-ranked
Tag: Statistical Programming
A previous article defines the silhouette statistic (Rousseeuw, 1987) and shows how to use it to identify observations in a cluster analysis that are potentially misclassified. The article provides many graphs, including the silhouette plot, which is a bar chart or histogram that displays the distribution of the silhouette statistic
A lot of programmers have been impressed by the ability of ChatGPT, GPT-4, and Bing Chat to write computer programs. Recently, I wrote an article that discusses an elementary programming assignment, called FizzBuzz, which is sometimes used as part of a hiring process to assess a candidate's basic knowledge of
The fundamental operations on sets are union, intersection, and set difference, all of which are supported directly in the SAS IML language. While studying another programming language, I noticed that the language supports an additional operation, namely the symmetric difference between two sets. The language also supports query functions to
I recently discussed introductory programming with a colleague who teaches Python at a university. He told me about the following introductory programming assignment: Let N be an integer parameter in the range [1, 9]. For each value of N, find all pairs of one-digit positive integers d1 and d2 that
In a previous article about Markov transition matrices, I mentioned that you can estimate a Markov transition matrix by using historical data that are collected over a certain length of time. A SAS programmer asked how you can estimate a transition matrix in SAS. The answer is that you can
Most homeowners know that large home improvement projects can take longer than you expect. Whether it's remodeling a kitchen, adding a deck, or landscaping a yard, big projects are expensive and subject to a lot of uncertainty. Factors such as weather, the availability of labor, and the supply of materials,
A previous article describes the metalog distribution (Keelin, 2016). The metalog distribution is a flexible family of distributions that can model a wide range of shapes for data distributions. The metalog system can model bounded, semibounded, and unbounded continuous distributions. This article shows how to use the metalog distribution in
Undergraduate textbooks on probability and statistics typically prove theorems that show how the variance of a sum of random variables is related to the variance of the original variables and the covariance between them. For example, the Wikipedia article on Variance contains an equation for the sum of two random
A SAS programmer wanted to compute the distribution of X-Y, where X and Y are two beta-distributed random variables. Pham-Gia and Turkkan (1993) derive a formula for the PDF of this distribution. Unfortunately, the PDF involves evaluating a two-dimensional generalized hypergeometric function, which is not available in all programming languages.