Blogs

Blogs

Tag: Data Analysis

Rick WicklinAugust 14, 2013 0

Dryer balls and drying time: A statistical analysis

Earlier this week I posted a "guest blog" in which my 8th grade son described a visualization of data for the 2013 ASA Poster Competition. The purpose of today's blog post is to present a higher-level statistical analysis of the same data. I will use a t test and a

Read More

Rick WicklinAugust 12, 2013 0

Do dryer balls reduce drying time?

Editor's Note: My 8th grade son, David, created a poster that he submitted to the 2013 ASA Poster Competition. The competition encourages students to display "two or more related graphics that summarize a set of data, look at the data from different points of view, and answer specific questions about

Read More

Rick WicklinJuly 31, 2013 0

Read hundreds of data sets into matrices

Do you have dozens (or even hundreds) of SAS data sets that you want to read into SAS/IML matrices? In a previous blog post, I showed how to iterate over a series of data sets and analyze each one. Inside the loop, I read each data set into a matrix

Read More

Rick WicklinJuly 17, 2013 0

A simple implementation of two-dimensional binning

In a previous article I discussed how to bin univariate observations by using the BIN function, which was added to the SAS/IML language in SAS/IML 9.3. You can generalize that example and bin bivariate or multivariate data. Over two years ago I wrote a blog post on 2D binning in

Read More

Learn SAS

Rick WicklinJuly 15, 2013 0

Bin observations by using custom cut points and unevenly spaced bins

It is often useful to partition observations for a continuous variable into a small number of intervals, called bins. This familiar process occurs every time that you create a histogram, such as the one on the left. In SAS you can create this histogram by calling the UNIVARIATE procedure. Optionally,

Read More

Rick WicklinJune 26, 2013 0

How to color clusters in a dendrogram

The CLUSTER procedure in SAS/STAT software creates a dendrogram automatically. The black-and-white dendrogram is nice, but plain. A SAS customer wanted to know whether it is possible to add color to the dendrogram to emphasize certain clusters. For example, the plot at the left emphasizes a four-cluster scenario for clustering

Read More

Rick WicklinJune 17, 2013 0

Repetition factors versus frequency variables

A regular reader noticed my post on initializing vectors by using repetition factors and asked whether that technique would be useful to expand data that are given in value-frequency pairs. The short answer is "no." Repetition factors are useful for defining (static) matrix literals. However, if you want to expand

Read More

Rick WicklinJune 12, 2013 0

How to interpret a residual-fit spread plot

In a previous blog post, I described how to use a spread plot to compare the distributions of several variables. Each spread plot is a graph of centered data values plotted against the estimated cumulative probability. Thus, spread plots are similar to a (rotated) plot of the empirical cumulative distribution

Read More

Rick WicklinJune 10, 2013 0

Visually comparing different data distributions: The spread plot

Suppose that you have several data distributions that you want to compare. Questions you might ask include "Which variable has the largest spread?" and "Which variables exhibit skewness?" More generally, you might be interested in visualizing how the distribution of one variable differs from the distribution of other variables. The

Read More

Rick WicklinMay 28, 2013 0

New heat maps in the REG procedure

Has anyone noticed that the REG procedure in SAS/STAT 12.1 produces heat maps instead of scatter plots for fit plots and residual plots when the regression involves more than 5,000 observations? I wasn't aware of the change until a colleague informed me, although the change is discussed in the "Details"

Read More

Rick WicklinMay 13, 2013 0

Use regression for a univariate analysis? Yes!

I've conducted a lot of univariate analyses in SAS, yet I'm always surprised when the best way to carry out the analysis uses a SAS regression procedure. I always think, "This is a univariate analysis! Why am I using a regression procedure? Doesn't a regression require at least two variables?"

Read More

Rick WicklinMay 8, 2013 0

A three-panel visualization of a distribution

At a recent conference, I talked with a SAS customer who told me that he was using an R package to create a three-panel visualization of a distribution. Unfortunately, he couldn't remember the name of the package, and he has not returned my e-mails, so the purpose of today's article

Read More

Rick WicklinMay 6, 2013 0

Compute confidence intervals for percentiles in SAS

PROC UNIVARIATE has provided confidence intervals for standard percentiles (quartiles) for eons. However, in SAS 9.3M2 (featuring the 12.1 analytical procedures) you can use a new feature in PROC UNIVARIATE to compute confidence intervals for a specified list of percentiles. To be clear, percentiles and quantiles are essentially the same

Read More

Rick WicklinApril 17, 2013 0

Quantile regression: Better than connecting the sample quantiles of binned data

I often see variations of the following question posted on statistical discussion forums: I want to bin the X variable into a small number of values. For each bin, I want to draw the quartiles of the Y variable for that bin. Then I want to connect the corresponding quartile

Read More

Rick WicklinApril 3, 2013 0

The difference of density estimates: When does it make sense?

I was recently asked how to compute the difference between two density estimates in SAS. The person who asked the question sent me a link to a paper from The Review of Economics and Statistics that contains several examples of this technique (for example, see Figure 3 on p. 16

Read More

Advanced Analytics

Rick WicklinMarch 27, 2013 0

How to compute the distance between observations in SAS

In statistics, distances between observations are used to form clusters, to identify outliers, and to estimate distributions. Distances are used in spatial statistics and in other application areas. There are many ways to define the distance between observations. I have previously written an article that explains Mahalanobis distance, which is

Read More

Analytics | Data Visualization

Robert AllisonMarch 21, 2013 0

Basketball tournaments, Moneyball, and sports analytics

A big part of "winning" these days (be it sports or a business) is performing analytics better than your competition. This is demonstrated in awe-inspiring fashion in the book (and movie) "Moneyball." And on that topic, I'd like to show you a few ways SAS can be used to analyze sports data

Read More

Sports & Entertainment

Advanced Analytics

Rick WicklinMarch 20, 2013 0

Understanding ridge regression in SAS

Someone recently asked a question on the SAS Support Communities about estimating parameters in ridge regression. I answered the question by pointing to a matrix formula in the SAS documentation. One of the advantages of the SAS/IML language is that you can implement matrix formulas in a natural way. The

Read More

Advanced Analytics

Rick WicklinMarch 13, 2013 0

The case of spilled coffee and the regression intercept

Argh! I've just spilled coffee on output that shows the least squares coefficients for a regression model that I was investigating. Now the parameter estimate for the intercept is completely obscured, although I can still see the parameter estimates for the coefficients of the continuous explanatory variable. What can I

Read More

Rick WicklinMarch 6, 2013 0

SAS/IML Posters and Presentations at SAS Global Forum 2013

There is something for everyone at SAS Global Forum 2013. I like to attend presentations in the Statistics and Data Analysis track and talk with SAS customers in the SAS Support and Demo Area. But one activity that I enjoy the most is to stroll through the poster area and

Read More

Learn SAS

Rick WicklinFebruary 27, 2013 0

How to use PROC SGPLOT to display the slope and intercept of a regression line

A SAS user asked an interesting question on the SAS/GRAPH and ODS Graphics Support Forum. The question is: Does PROC SGPLOT support a way to display the slope of the regression line that is computed by the REG statement? Recall that the REG statement in PROC SGPLOT fits and displays

Read More

Advanced Analytics

Rick WicklinFebruary 6, 2013 0

Find variables common to multiple data sets

Last week the SAS Training Post blog posted a short article on an easy way to find variables in common to two data sets. The article used PROC CONTENTS (with the SHORT option) to print out the names of variables in SAS data sets so that you can visually determine

Read More

Rick WicklinJanuary 14, 2013 0

Create a bar chart with an "Others" category

When a categorical variable has dozens or hundreds of categories, it is often impractical and undesirable to create a bar chart that shows the counts for all categories. Two alternatives are popular: Display only the Top 10 or Top 20 categories. As I showed last week, to do this in

Read More

Rick WicklinJanuary 9, 2013 0

Create a bar chart with only a few categories

Sometimes a categorical variable has many levels, but you are only interested in displaying the levels that occur most frequently. For example, if you are interested in the number of times that a song was purchased on iTunes during the past week, you probably don't want a bar chart with

Read More

Advanced Analytics | Learn SAS

Rick WicklinJanuary 3, 2013 0

12 Tips for SAS Statistical Programmers

It's the start of a new year. Have you made a resolution to be a better data analyst? A better SAS statistical programmer? To learn more about multivariate statistics? What better way to start the New Year than to read (or re-read!) the top 12 articles for statistical programmers from

Read More

Rick WicklinDecember 5, 2012 0

Remove or keep: Which is faster?

In a recent article on efficient simulation from a truncated distribution, I wrote some SAS/IML code that used the LOC function to find and exclude observations that satisfy some criterion. Some readers came up with an alternative algorithm that uses the REMOVE function instead of subscripts. I remarked in a

Read More

Rick WicklinOctober 17, 2012 0

Specify the colors of groups in SAS statistical graphics

Sometimes a graph is more interpretable if you assign specific colors to categories. For example, if you are graphing the number of Olympic medals won by various countries at the 2012 London Olympics, you might want to assign the colors gold, silver, and bronze to represent first-, second-, and third-place

Read More

New York Times graphic

Rick WicklinOctober 15, 2012 0

Women and jobs: Redesigning a New York Times graphic

The New York Times has an excellent staff that produces visually interesting graphics for the general public. However, because their graphs need to be understood by all Times readers, the staff sometimes creates a complicated infographic when a simpler statistical graph would show the data in a clearer manner. A

Read More

Advanced Analytics

Rick WicklinSeptember 24, 2012 0

Grouping observations based on quantiles

Sometimes it is useful to group observations based on the values of some variable. Common schemes for grouping include binning and using quantiles. In the binning approach, a variable is divided into k equal intervals, called bins, and each observation is assigned to a bin. In this scheme, the size

Read More

Rick WicklinSeptember 19, 2012 0

Visualizing congressional representation by state and time

With the US presidential election looming, all eyes are on the Electoral College. In the presidential election, each state gets as many votes in the Electoral College as it has representatives in both congressional houses. (The District of Columbia also gets three electors.) Because every state has two senators, it

Read More

Previous 1 … 11 12 13 14 15 … 17 Next