Tag: Data Analysis

Rick Wicklin 0
Remove or keep: Which is faster?

In a recent article on efficient simulation from a truncated distribution, I wrote some SAS/IML code that used the LOC function to find and exclude observations that satisfy some criterion. Some readers came up with an alternative algorithm that uses the REMOVE function instead of subscripts. I remarked in a

Rick Wicklin 0
Visualizing US commute times and congestion

Robert Allison posted a map that shows the average commute times for major US cities, along with the proportion of the commute that is attributed to traffic jams and other congestion. The data are from a CEOs for Cities report (Driven Apart, 2010, p. 45). Robert use SAS/GRAPH software to

Rick Wicklin 0
A statistically beautiful Father's Day

To celebrate special occasions like Father's Day, I like to relax with a cup of coffee and read the newspaper. When I looked at the weather page, I was astonished by the seeming uniformity of temperatures across the contiguous US. The weather map in my newspaper was almost entirely yellow

Rick Wicklin 0
BY-group processing in SAS/IML

Because the SAS/IML language is a general purpose programming language, it doesn't have a BY statement like most other SAS procedures (such as PROC REG). However, there are several ways to loop over categorical variables and perform an analysis on the observations in each category. One way is to use

Rick Wicklin 0
The Poissonness plot: A goodness-of-fit diagnostic

Last week I discussed how to fit a Poisson distribution to data. The technique, which involves using the GENMOD procedure, produces a table of some goodness-of-fit statistics, but I find it useful to also produce a graph that indicates the goodness of fit. For continuous distributions, the quantile-quantile (Q-Q) plot

Rick Wicklin 0
Creating a periodic smoother

In yesterday's post, I discussed a "quick and dirty" method to smooth periodic data. However, after I smoothed the data I remarked that the smoother itself was not exactly periodic. At the end points of the periodic interval, the smoother did not have equal slopes and the method does not

Rick Wicklin 0
Smoothers for periodic data

Over at the SAS and R blog, Ken Kleinman discussed using polar coordinates to plot time series data for multiple years. The time series plot was reproduced in SAS by my colleague Robert Allison. The idea of plotting periodic data on a circle is not new. In fact it goes

Rick Wicklin 0
Count missing values in observations

Locating missing values is important in statistical data analysis. I've previously written about how to count the number of missing values for each variable in a data set. In Base SAS, I showed how to use the MEANS or FREQ procedures to count missing values. In the SAS/IML language, I

Rick Wicklin 0
Linear interpolation in SAS/IML

A recent discussion on the SAS-L discussion forum concerned how to implement linear interpolation in SAS. Some people suggested using PROC EXPAND in SAS/ETS software, whereas others proposed a DATA step solution. For me, the SAS/IML language provides a natural programming environment to implement an interpolation scheme. It also provides

Rick Wicklin 0
Compute sample quantiles by using the QNTL call

SAS provides several ways to compute sample quantiles of data. The UNIVARIATE procedure can compute quantiles (also called percentiles), but you can also compute them in the SAS/IML language. Prior to SAS/IML 9.22 (released in 2010) statistical programmers could call a SAS/IML module that computes sample quantiles. With the release

Rick Wicklin 0
Quantiles of discrete distributions

I work with continuous distributions more often than with discrete distributions. Consequently, I am used to thinking of the quantile function as being an inverse cumulative distribution function (CDF). (These functions are described in my article, "Four essential functions for statistical programmers.") For discrete distributions, they are not. To quote

Rick Wicklin 0
Testing data for multivariate normality

I've blogged several times about multivariate normality, including how to generate random values from a multivariate normal distribution. But given a set of multivariate data, how can you determine if it is likely to have come from a multivariate normal distribution? The answer, of course, is to run a goodness-of-fit

Advanced Analytics
Rick Wicklin 0
What is Mahalanobis distance?

I previously described how to use Mahalanobis distance to find outliers in multivariate data. This article takes a closer look at Mahalanobis distance. A subsequent article will describe how you can compute Mahalanobis distance. Distance in standard units In statistics, we sometimes measure "nearness" or "farness" in terms of the

1 11 12 13 14 15 16