How to compute the distance between observations in SAS

In statistics, distances between observations are used to form clusters, to identify outliers, and to estimate distributions. Distances are used in spatial statistics and in other application areas. There are many ways to define the distance between observations. I have previously written an article that explains Mahalanobis distance, which is

Understanding ridge regression in SAS

Someone recently asked a question on the SAS Support Communities about estimating parameters in ridge regression. I answered the question by pointing to a matrix formula in the SAS documentation. One of the advantages of the SAS/IML language is that you can implement matrix formulas in a natural way. The

The case of spilled coffee and the regression intercept

Argh! I've just spilled coffee on output that shows the least squares coefficients for a regression model that I was investigating. Now the parameter estimate for the intercept is completely obscured, although I can still see the parameter estimates for the coefficients of the continuous explanatory variable. What can I

Construct normal data from summary statistics

Last week there was an interesting question posted to the "Stat-Math Statistics" group on LinkedIn. The original question was a little confusing, so I'll state it in a more general form: A population is normally distributed with a known mean and standard deviation. A sample of size N is drawn

Friday's Innovation Inspiration - Recycle, reuse

This technical case study by Faisal Dosani, Royal Bank of Canada; Lisa Eckler, Lisa Eckler Consulting Inc.; and Marje Fecht, Prowerk Consulting Ltd.,  discusses the steps to develop a hands-off process for creating flexible and extensible solutions that avoid maintainability issues and enable speed to market of results. Building reusable

Friday's Innovation Inspiration - Efficient credit scoring

The traditional methods of making credit decisions relied mostly on human judgment; those have been replaced by methods that use statistical models. Today, statistical models are used not only for deciding whether to accept an applicant (application scoring), but also to predict the likelihood of defaults among customers who have

Friday's Innovation Inspiration - Version control

Anything that you do manually leaves the door open for error; this is especially true for your file system.  Aside from that, automated processes are usually faster. Magnus Mengelbier has applied this philosophy to providing version control capabilities to SAS data sets, programs and outputs.

12 Tips for SAS Statistical Programmers

It's the start of a new year. Have you made a resolution to be a better data analyst? A better SAS statistical programmer? To learn more about multivariate statistics? What better way to start the New Year than to read (or re-read!) the top 12 articles for statistical programmers from

Friday's Innovation Inspiration - Hiring for keeps

I recently published a post based on an InformationWeek article about the need for more analytic talent and tips for finding the right talent. InformationWeek failed  to include information about using SAS to uncover fraudulent responses in applications. This Post-It Note author uses SAS for that and entertainment.

Friday's Innovation Inspiration - A %mockery?

Yao Huang says that you can use the %mock_table SAS macro to build mock tables needed for Phase I clinical trials. "Instead of spending a lot time to create or modify each table using a word processor, statisticians or programmers can quickly run this macro using a pre-specified excel template

Computing the nearest correlation matrix

Frequently someone will post a question to the SAS Support Community that says something like this: I am trying to do [statistical task]and SAS issues an error and reports that my correlation matrix is not positive definite. What is going on and how can I complete [the task]? The statistical