A common question on SAS discussion forums is how to compute a moving average in SAS. This article shows how to use PROC EXPAND and contains links to articles that use the DATA step or macros to compute moving averages in SAS. In a previous post, I explained how to
Tag: Data Analysis
A moving average (also called a rolling average) is a statistical technique that is used to smooth a time series. Moving averages are used in finance, economics, and quality control. You can overlay a moving average curve on a time series to visualize how each value compares to a rolling
Weighted averages are all around us. Teachers use weighted averages to assign a test more weight than a quiz. Schools use weighted averages to compute grade-point averages. Financial companies compute the return on a portfolio as a weighted average of the component assets. Financial charts show (linearly) weighted moving averages
I wrote 114 posts for The DO Loop blog in 2015. Which were the most popular with readers? In general, highly technical articles appeal to only a small group of readers, whereas less technical articles appeal to a larger audience. Consequently, many of my popular articles were related to data
A SAS customer asked: Why isn't the chi-square distribution supported in PROC UNIVARIATE? That is an excellent question. I remember asking a similar question when I first started learning SAS. In addition to the chi-square distribution, I wondered why the UNIVARIATE procedure does not support the F distribution. These are
Did you know that the FREQ procedure in SAS can compute exact p-values for more than 20 statistical tests and statistics that are associated with contingency table? Mamma mia! That's a veritable smorgasbord of options! Some of the tests are specifically for one-way tables or 2 x 2 tables, but many apply
Suppose that you are tabulating the eye colors of students in a small class (following Friendly, 1992). Depending upon the ethnic groups of these students, you might not observe any green-eyed students. How do you put a 0 into the table that summarizes the number of students who have each
Last week I discussed ordinary least squares (OLS) regression models and showed how to illustrate the assumptions about the conditional distribution of the response variable. For a single continuous explanatory variable, the illustration is a scatter plot with a regression line and several normal probability distributions along the line. The
I wanna be an airborne ranger, Live the life of guts and danger.* If you are an 80's movie buff, you might remember the scene in The Breakfast Club where Bender, the juvenile delinquent played by Judd Nelson, distracts the principal by running through the school singing this song. Recently,
Typically a correlation analysis reports the correlations between all pairs of variables, including the variables with themselves. The resulting correlation matrix is square, symmetric, and has 1s on the main diagonal. But suppose you are interested in only specific combinations of variables. Perhaps you want the pairwise correlations between one
Last week's post about odds ratio plots in SAS made me think about a similar plot that visualizes the parameter estimates for a regression analysis. The so-called regression coefficient plot is a scatter plot of the estimates for each effect in the model, with lines that indicate the width of
I recently read an argument by Andrew Wheeler for using a logarithmic axis for plotting odds ratios. I found his argument convincing. Accordingly, this blog post shows how to create an odds ratio plot in SAS where the ratio axis is displayed on a log scale. Thanks to Bob Derr
The Raleigh News & Observer published a front-page article about the effect of wealth and poverty on high school athletics in North Carolina. In particular, the article concluded that "high schools with a high percentage of poor students rarely win titles in the so-called country club sports—tennis, golf and swimming—and
My colleague Robert Allison finds the most interesting data sets to visualize! Yesterday he posted a visualization of toothless seniors in the US. More precisely, he created graphs that show the estimated prevalence of adults (65 years or older) who have had all their natural teeth extracted. The dental profession
My son is in high school and plans to take the ACT, a standardized test to assess college aptitude and readiness. My wife asked, "What is a good score for the ACT?" I didn't know, but I did a quick internet search and discovered a tabulation of scores for the
Recently a SAS customer asked how to Winsorize data in SAS. Winsorization is best known as a way to construct robust univariate statistics. The Winsorized mean is a robust estimate of location. The Winsorized mean is similar to the trimmed mean, and both are described in the documentation for PROC
When you count the outcomes of an experiment, you do not always observe all of the possible outcomes. For example, if you roll a six-sided die 10 times, it might be that the "1" face does not appear in those 10 rolls. Obviously, this situation occurs more frequently with small
"Daddy, help! Help me! Come quick!" I heard my daughter's screams from the upstairs bathroom and bounded up the stairs two at a time. Was she hurt? Bleeding? Was the toilet overflowing? When I arrived in the doorway, she pointed at the wall and at the floor. The wall was
I was reading a statistics book when I encountered a histogram that caught my eye. The histogram looked similar to the one at the left. It contained a normal density estimate overlaid on a histogram, but the height of the density curve seemed too short when compared to the heights
A SAS programmer asked for a list of SAS/IML functions that operate on the columns of an n x p matrix and return a 1 x p row vector of results. The functions that behave this way tend to compute univariate descriptive statistics such as the mean, median, standard deviation, and quantiles. The following
Perhaps you saw the headlines earlier this week about the fact that it has been nine years since the last major hurricane (category 3, 4, or 5) hit the US coast. According to a post on the GeoSpace blog, which is published by the American Geophysical Union (AGU), researchers ran
Suppose that you compute the correlation matrix (call it R1) for a set of variables x1, x2, ..., x8. For some reason, you later want to compute the correlation matrix for the variables in a different order, maybe x2, x1, x7,..., x6. Do you need to go back to the
Imagine that you have one million rows of numerical data and you want to determine if a particular "target" value occurs. How might you find where the value occurs? For univariate data, this is an easy problem. In the SAS DATA step you can use a WHERE clause or a
Saturday, March 14, 2015, is Pi Day, and this year is a super-special Pi Day! This is your once-in-a-lifetime chance to celebrate the first 10 digits of pi (π) by doing something special on 3/14/15 at 9:26:53. Apologies to my European friends, but Pi Day requires that you represent dates
I recently wrote about how to overlay multiple curves on a single graph by reshaping wide data (with many variables) into long data (with a grouping variable). The implementation used PROC TRANSPOSE, which is a procedure in Base SAS. When you program in the SAS/IML language, you might encounter data
Data. To a statistician, data are the observed values. To a SAS programmer, analyzing data requires knowledge of the values and how the data are arranged in a data set. Sometimes the data are in a "wide form" in which there are many variables. However, to perform a certain analysis
I published 118 blog posts in 2014. This article presents my most popular posts from 2014 and late 2013. 2014 will always be a special year for me because it was the year that the SAS University Edition was launched. The University Edition means that SAS/IML is available to all
My colleagues at the SAS & R blog recently posted an example of how to program a permutation test in SAS and R. Their SAS implementation used Base SAS and was "relatively cumbersome" (their words) when compared with the R code. In today's post I implement the permutation test in
My colleague Robert Allison has a knack for finding fascinating data. Last week he did it again by locating data about how blood types and Rh factors vary among countries. He produced a series of eight world maps, each showing the prevalence of a blood type (A+, A-, B+, B-,
In my article about how to create a quantile plot, I chose not to discuss a theoretical issue that occasionally occurs. The issue is that for discrete data (which includes rounded values), it might be impossible to use quantile values to split the data into k groups where each group