Do you want to create customized SAS graphs by using PROC SGPLOT and the other ODS graphics procedures? An essential skill that you need to learn is how to merge, join, append, and concatenate SAS data sets that come from different sources. The SAS statistical graphics procedures (SG procedures) enable
Author
In the classic textbook by Johnson and Wichern (Applied Multivariate Statistical Analysis, Third Edition, 1992, p. 164), it says: All measures of goodness-of-fit suffer the same serious drawback. When the sample size is small, only the most aberrant behaviors will be identified as lack of fit. On the other hand,
Somewhere in my past I encountered a panel of histograms for small random samples of normal data. I can't remember the source, but it might have been from John Tukey or William Cleveland. The point of the panel was to emphasize that (because of sampling variation) a small random sample
A SAS customer asked how to use background colors and a dashed line to emphasize the forecast region for a graph that shows a time series model. The task requires the following steps: Use the ATTRPRIORITY=NONE option on the ODS GRAPHICS statement to make sure that the current ODS style
At a conference last week, a presenter showed SAS statements that compute the logarithm of a probability density function (PDF). The log-PDF is a a common computation because it occurs when maximizing the log-likelihood function. The presenter computed the expression in SAS by using an expression that looked like y
Who was the oldest person elected president of the United States? How about the youngest? Who was the oldest when he left office? Let's look at some data. Wikipedia has a page that presents a table of the presidents of the US by age. It lists the dates for which
If you obtain data from web sites, social media, or other unstandardized data sources, you might not know the form of dates in the data. For example, the US Independence Day might be represented as "04JUL1776", "07/04/1776", "Jul 4, 1776", or "July 4, 1776." Fortunately, the ANYDTDTE informat makes it
This article uses graphical techniques to visualize one of my favorite geometric objects: the surface of a three-dimensional torus. Along the way, this article demonstrates techniques that are useful for visualizing more mundane 3-D point clouds that arise in statistical data analysis. Define points on a torus A torus is
Rotation matrices are used in computer graphics and in statistical analyses. A rotation matrix is especially easy to implement in a matrix language such as the SAS Interactive Matrix Language (SAS/IML). This article shows how to implement three-dimensional rotation matrices and use them to rotate a 3-D point cloud. Define
Occasionally on a discussion forum, a statistical programmer will ask a question like the following: I am trying to fit a parametric distribution to my data. The sample has a long tail, so I have tried the lognormal, Weibull, and gamma distributions, but nothing seems to fit. Please help!! In
Every year near Halloween I write an article in which I demonstrate a simple programming trick that is a real treat to use. This year's trick (which features the CMISS function and the crossproducts matrix in SAS/IML) enables you to count the number of observations that are missing for pairs
When simulating data or testing algorithms, it is useful to be able to generate patterns of missing data. This article shows how to generate random and systematic patterns of missing values. In other words, this article shows how to replace nonmissing data with missing data. Generate a random pattern of
I've written several articles about scatter plot smoothers: nonparametric regression curves that reveal small- and large-scale features of a response variable as a function of an explanatory variable. However, there is another kind of "smoothness" that you might care about, and that is the apparent smoothness of curves and markers
A previous post discusses how the loess regression algorithm is implemented in SAS. The LOESS procedure in SAS/STAT software provides the data analyst with options to control the loess algorithm and fit nonparametric smoothing curves through points in a scatter plot. Although PROC LOESS satisfies 99.99% of SAS users who
Loess regression is a nonparametric technique that uses local weighted regression to fit a smooth curve through points in a scatter plot. Loess curves are can reveal trends and cycles in data that might be difficult to model with a parametric curve. Loess regression is one of several algorithms in
How far away is the nearest hospital? How far is the nearest restaurant? The nearest gas station? These are commonly asked questions whose answers depend on the location of the person asking the question. Recently I showed an algorithm that enables you to find the distance between a set of
The WHERE clause in SAS is a powerful mechanism for selecting observations as you read or write a data set. The WHERE clause supports many operators, including the IN operator, which enables you to compactly specify multiple conditions for a categorical variable. A common use of the IN operator is
What is weighted regression? How does it differ from ordinary (unweighted) regression? This article describes how to compute and score weighted regression models. Visualize a weighted regression Technically, an "unweighted" regression should be called an "equally weighted " regression since each ordinary least squares (OLS) regression weights each observation equally.
The recent releases of SAS 9.4 have featured major enhancements to the ODS statistical graphics procedures such as PROC SGPLOT. In fact, PROC SGPLOT (and the underlying Graph Template Language (GTL)) are so versatile and powerful that you might forget to consider whether you can create a graph automatically by
Last week I showed how to find the nearest neighbors for a set of d-dimensional points. A SAS user wrote to ask whether something similar could be done when you have two distinct groups of points and you want to find the elements in the second group that are closest
My son is taking an AP Statistics course in high school this year. AP Statistics is one of the fastest-growing AP courses, so I welcome the chance to see topics and techniques in the course. Last week I was pleased to see that they teach data exploration techniques, such as
Although statisticians often assume normally distributed errors, there are important processes for which the error distribution has a heavy tail. A well-known heavy-tailed distribution is the t distribution, but the t distribution is unsuitable for some applications because it does not have finite moments (means, variance,...) for small parameter values.
Last week I showed how to compute nearest-neighbor distances for a set of numerical observations. Nearest-neighbor distances are used in many statistical computations, including the analysis of spatial point patterns. This article describes how the distribution of nearest-neighbor distances can help you determine whether spatial data are uniformly distributed or
One of the strengths of the SGPLOT procedure in SAS is the ease with which you can overlay multiple plots on the same graph. For example, you can easily combine the SCATTER and SERIES statements to add a curve to a scatter plot. However, if you try to overlay incompatible
The article uses the SAS DATA step and Base SAS procedures to estimate the coverage probability of the confidence interval for the mean of normally distributed data. This discussion is based on Section 5.2 (p. 74–77) of Simulating Data with SAS. What is a confidence interval? Recall that a confidence
Last week I wrote about how to compute sample quantiles and weighted quantiles in SAS. As part of that article, I needed to draw some step functions. Recall that a step function is a piecewise constant function that jumps by a certain amount at a finite number of points. Graph
This article describes how you can evaluate the Lambert W function in SAS/IML software. The Lambert W function is defined implicitly: given a real value x, the function's value w = W(x) is the value of w that satisfies the equation w exp(w) = x. Thus W is the inverse
Many univariate descriptive statistics are intuitive. However, weighted statistic are less intuitive. A weight variable changes the computation of a statistic by giving more weight to some observations than to others. This article shows how to compute and visualize weighted percentiles, also known as a weighted quantiles, as computed by
Edmond Halley (1656-1742) is best known for computing the orbit and predicting the return of the short-period comet that bears his name. However, like many scientists of his era, he was involved in a variety of mathematical and scientific activities. One of his mathematical contributions is a numerical method for
It is easy to use PROC SGPLOT and BY-group processing to create an animated graph in SAS 9.4. Sanjay Matange previously discussed how to create an animated plot in SAS 9.4, but he used a macro loop to call PROC SGPLOT many times. It is often easier to use the