My previous blog post shows how to use PROC LOGISTIC and spline effects to predict the probability that an NBA player scores from various locations on a court. The LOGISTIC procedure fits parametric models, which means that the procedure estimates parameters for every explanatory effect in the model. Spline bases
Author
Last week Robert Allison showed how to download NBA data into SAS and create graphs such as the location where Stephen Curry took shots in the 2015-16 season to date. The graph at left shows the kind of graphs that Robert created. I've reversed the colors from Robert's version, so
There are several ways to simulate multinomial data in SAS. In the SAS/IML matrix language, you can use the RANDMULTINOMIAL function to generate samples from the multinomial distribution. If you don't have a SAS/IML license, I have previously written about how to use the SAS DATA step or PROC SURVEYSELECT
Today is March 14th, which is annually celebrated as Pi Day. Today's date, written as 3/14/16, represents the best five-digit approximation of pi. On Pi Day, many people blog about how to approximate pi. This article uses a Monte Carlo simulation to estimate pi, in spite of the fact that
You can use histograms to visualize the distribution of data. A comparative histogram enables you to compare two or more distributions, which usually represent subpopulations in the data. Common subpopulations include males versus females or a control group versus an experimental group. There are two common ways to construct a
Most SAS regression procedures support the "stars and bars" operators, which enable you to create models that include main effects and all higher-order interaction effects. You can also easily create models that include all n-way interactions up to a specified value of n. However, it can be a challenge to
Last week I showed how to create dummy variables in SAS by using the GLMMOD procedure. The procedure enables you to create design matrices that encode continuous variables, categorical variables, and their interactions. You can use dummy variables to replace categorical variables in procedures that do not support a CLASS
One of the first things SAS programmers learn is that SAS data sets can be specified in two ways. You can use a two-level name such as "sashelp.class" which uses a SAS libref (SASHELP) and a member name (CLASS) to specify the location of the data set. Alternatively, you can
SAS programmers sometimes ask, "How do I create a design matrix in SAS?" A design matrix is a numerical matrix that represents the explanatory variables in regression models. In simple models, the design matrix contains one column for each continuous variable and multiple columns (called dummy variables) for each classification
A dummy variable (also known as indicator variable) is a numeric variable that indicates the presence or absence of some level of a categorical variable. The word "dummy" does not imply that these variables are not smart. Rather, dummy variables serve as a substitute or a proxy for a categorical
Last week Sanjay Matange wrote about a new SAS 9.4m3 option that enables you to show all categories in a graph legend, even when the data do not contain all the categories. Sanjay's example was a chart that showed medical conditions classified according to the scale "Mild," "Moderate," and "Severe."
Many simulation and resampling tasks use one of four sampling methods. When you draw a random sample from a population, you can sample with or without replacement. At the same time, all individuals in the population might have equal probability of being selected, or some individuals might be more likely
How do you sample with replacement in SAS when the probability of choosing each observation varies? I was asked this question recently. The programmer thought he could use PROC SURVEYSELECT to generate the samples, but he wasn't sure which sampling technique he should use to sample with unequal probability. This
In the SAS/IML language, you can read data from a SAS data set into a set of vectors (each with their own name) or into a single matrix. Beginning programmers might wonder about the advantages of each approach. When should you read data into vectors? When should you read data
Last week I showed how to use PROC EXPAND to compute moving averages and other rolling statistics in SAS. Unfortunately, PROC EXPAND is part of SAS/ETS software and not every SAS site has a license for SAS/ETS. For simple moving averages, you can write a DATA step program, as discussed
Novice SAS programmers quickly learn the advantages of using PROC SORT to sort data, followed by a BY-group analysis of the sorted data. A typical example is to analyze demographic data by state or by ZIP code. A BY statement enables you to produce multiple analyses from a single procedure
A common question on SAS discussion forums is how to compute a moving average in SAS. This article shows how to use PROC EXPAND and contains links to articles that use the DATA step or macros to compute moving averages in SAS. In a previous post, I explained how to
A moving average (also called a rolling average) is a statistical technique that is used to smooth a time series. Moving averages are used in finance, economics, and quality control. You can overlay a moving average curve on a time series to visualize how each value compares to a rolling
In SAS, the aspect ratio of a graph is the physical height of the graph divided by the physical width. Recently I demonstrated how to set the aspect ratio of graphs in SAS by using the ASPECT= option in PROC SGPLOT or by using the OVERLAYEQUATED statement in the Graph
Parameters in SAS procedures are specified a list of values that you manually type into the procedure syntax. For example, if you want to specify a list of percentile values in PROC UNIVARIATE, you need to type the values into the PCTLPTS= option as follows: proc univariate data=sashelp.cars noprint; var
Recently I blogged about how to compute a weighted mean and showed that you can use a weighted mean to compute the center of mass for a system of N point masses in the plane. That led me to think about a related problem: computing the center of mass (called
I began 2016 by compiling a list of popular articles from my blog in 2015. This "People's Choice" list contains many interesting articles, but some of my personal favorites did not make the list. Today I present the "Editor's Choice" list of articles that deserve a second look. I've grouped
Weighted averages are all around us. Teachers use weighted averages to assign a test more weight than a quiz. Schools use weighted averages to compute grade-point averages. Financial companies compute the return on a portfolio as a weighted average of the component assets. Financial charts show (linearly) weighted moving averages
I wrote 114 posts for The DO Loop blog in 2015. Which were the most popular with readers? In general, highly technical articles appeal to only a small group of readers, whereas less technical articles appeal to a larger audience. Consequently, many of my popular articles were related to data
Lo how a rose e'er blooming From tender stem hath sprung As I write this blog post, a radio station is playing Chrismas music. One of my favorite Christmas songs is the old German hymn that many of us know as "Lo, How a Rose E're Blooming." I was humming
The most recent development environment for SAS programmers is SAS Studio, which is a browser-based application. The free SAS University Edition, which includes SAS/IML software, also uses SAS Studio as a development environment. SAS Studio has a special mode for programmers who use interactive procedures such as PROC IML. (Recall
Recently Sanjay Matange blogged about how to color the bars of a histogram according to a gradient color ramp. Using the fact that bar charts and histograms look similar, he showed how to use PROC SGPLOT in SAS to plot a bar chart in which each bar is colored according
A SAS customer asked: Why isn't the chi-square distribution supported in PROC UNIVARIATE? That is an excellent question. I remember asking a similar question when I first started learning SAS. In addition to the chi-square distribution, I wondered why the UNIVARIATE procedure does not support the F distribution. These are
Last week my colleague Chris Hemedinger published a blog post that described how to use the ODS LAYOUT GRIDDED statement to arrange tables and graphs in a panel. The statement was introduced in SAS 9.4m1 (December 2013). Gridded layout is supported for HTML, POWERPOINT, and the PRINTER family of destinations
When creating a statistical graphic such as a line plot or a scatter plot, it is sometimes important to preserve the aspect ratio of the data. For example, if the ranges of the X and Y variables are equal, it can be useful to display the data in a square