I previously described how to use Mahalanobis distance to find outliers in multivariate data. This article takes a closer look at Mahalanobis distance. A subsequent article will describe how you can compute Mahalanobis distance. Distance in standard units In statistics, we sometimes measure "nearness" or "farness" in terms of the
Tag: Data Analysis
In two previous blog posts I worked through examples in the survey article, "Robust statistics for outlier detection," by Peter Rousseeuw and Mia Hubert. Robust estimates of location in a univariate setting are well-known, with the median statistic being the classical example. Robust estimates of scale are less well-known, with
In a previous blog post on robust estimation of location, I worked through some of the examples in the survey article, "Robust statistics for outlier detection," by Peter Rousseeuw and Mia Hubert. I showed that SAS/IML software and PROC UNIVARIATE both support the robust estimators of location that are mentioned
I encountered a wonderful survey article, "Robust statistics for outlier detection," by Peter Rousseeuw and Mia Hubert. Not only are the authors major contributors to the field of robust estimation, but the article is short and very readable. This blog post walks through the examples in the paper and shows
A recent question on a SAS Discussion Forum was "how can you overlay multiple kernel density estimates on a single plot?" There are three ways to do this, depending on your goals and objectives. Overlay different estimates of the same variable Sometimes you have a single variable and want to
It is "well known" that the pairwise deletion of missing values and the resulting computation of correlations can lead to problems in statistical computing. I have previously written about this phenomenon in my article "When is a correlation matrix not a correlation matrix." Specifically, consider the symmetric array whose elements
Yesterday, December 7, 1941, a date which will live in infamy... - Franklin D. Roosevelt Today is the 70th anniversary of the Japanese attack on Pearl Harbor. The very next day, America declared war. During a visit to the Smithsonian National Museum of American History, I discovered the results of
Recently the "SAS Sample of the Day" was a Knowledge Base article with an impressively long title: Sample 42165: Using a stored process to eliminate duplicate values caused by multiple group memberships when creating a group-based, identity-driven filter in SAS® Information Map Studio "Wow," I thought. "This is the longest
Halloween night was rainy, so many fewer kids knocked on the door than usual. Consequently, I'm left with a big bucket of undistributed candy. One evening as I treated myself to a mouthful of tooth decay, I chanced to open a package of Wonka® Bottle Caps. The package contained three
Being able to reshape data is a useful skill in data analysis. Most of the time you can use the TRANSPOSE procedure or the SAS DATA step to reshape your data. But the SAS/IML language can be handy, too. I only use PROC TRANSPOSE a few times per year, so