Last week I showed how to create dummy variables in SAS by using the GLMMOD procedure. The procedure enables you to create design matrices that encode continuous variables, categorical variables, and their interactions. You can use dummy variables to replace categorical variables in procedures that do not support a CLASS
Author
One of the first things SAS programmers learn is that SAS data sets can be specified in two ways. You can use a two-level name such as "sashelp.class" which uses a SAS libref (SASHELP) and a member name (CLASS) to specify the location of the data set. Alternatively, you can
SAS programmers sometimes ask, "How do I create a design matrix in SAS?" A design matrix is a numerical matrix that represents the explanatory variables in regression models. In simple models, the design matrix contains one column for each continuous variable and multiple columns (called dummy variables) for each classification
A dummy variable (also known as indicator variable) is a numeric variable that indicates the presence or absence of some level of a categorical variable. The word "dummy" does not imply that these variables are not smart. Rather, dummy variables serve as a substitute or a proxy for a categorical
Last week Sanjay Matange wrote about a new SAS 9.4m3 option that enables you to show all categories in a graph legend, even when the data do not contain all the categories. Sanjay's example was a chart that showed medical conditions classified according to the scale "Mild," "Moderate," and "Severe."
Many simulation and resampling tasks use one of four sampling methods. When you draw a random sample from a population, you can sample with or without replacement. At the same time, all individuals in the population might have equal probability of being selected, or some individuals might be more likely
How do you sample with replacement in SAS when the probability of choosing each observation varies? I was asked this question recently. The programmer thought he could use PROC SURVEYSELECT to generate the samples, but he wasn't sure which sampling technique he should use to sample with unequal probability. This
In the SAS/IML language, you can read data from a SAS data set into a set of vectors (each with their own name) or into a single matrix. Beginning programmers might wonder about the advantages of each approach. When should you read data into vectors? When should you read data
Last week I showed how to use PROC EXPAND to compute moving averages and other rolling statistics in SAS. Unfortunately, PROC EXPAND is part of SAS/ETS software and not every SAS site has a license for SAS/ETS. For simple moving averages, you can write a DATA step program, as discussed
Novice SAS programmers quickly learn the advantages of using PROC SORT to sort data, followed by a BY-group analysis of the sorted data. A typical example is to analyze demographic data by state or by ZIP code. A BY statement enables you to produce multiple analyses from a single procedure
A common question on SAS discussion forums is how to compute a moving average in SAS. This article shows how to use PROC EXPAND and contains links to articles that use the DATA step or macros to compute moving averages in SAS. In a previous post, I explained how to
A moving average (also called a rolling average) is a statistical technique that is used to smooth a time series. Moving averages are used in finance, economics, and quality control. You can overlay a moving average curve on a time series to visualize how each value compares to a rolling
In SAS, the aspect ratio of a graph is the physical height of the graph divided by the physical width. Recently I demonstrated how to set the aspect ratio of graphs in SAS by using the ASPECT= option in PROC SGPLOT or by using the OVERLAYEQUATED statement in the Graph
Parameters in SAS procedures are specified a list of values that you manually type into the procedure syntax. For example, if you want to specify a list of percentile values in PROC UNIVARIATE, you need to type the values into the PCTLPTS= option as follows: proc univariate data=sashelp.cars noprint; var
Recently I blogged about how to compute a weighted mean and showed that you can use a weighted mean to compute the center of mass for a system of N point masses in the plane. That led me to think about a related problem: computing the center of mass (called
I began 2016 by compiling a list of popular articles from my blog in 2015. This "People's Choice" list contains many interesting articles, but some of my personal favorites did not make the list. Today I present the "Editor's Choice" list of articles that deserve a second look. I've grouped
Weighted averages are all around us. Teachers use weighted averages to assign a test more weight than a quiz. Schools use weighted averages to compute grade-point averages. Financial companies compute the return on a portfolio as a weighted average of the component assets. Financial charts show (linearly) weighted moving averages
I wrote 114 posts for The DO Loop blog in 2015. Which were the most popular with readers? In general, highly technical articles appeal to only a small group of readers, whereas less technical articles appeal to a larger audience. Consequently, many of my popular articles were related to data
Lo how a rose e'er blooming From tender stem hath sprung As I write this blog post, a radio station is playing Chrismas music. One of my favorite Christmas songs is the old German hymn that many of us know as "Lo, How a Rose E're Blooming." I was humming
The most recent development environment for SAS programmers is SAS Studio, which is a browser-based application. The free SAS University Edition, which includes SAS/IML software, also uses SAS Studio as a development environment. SAS Studio has a special mode for programmers who use interactive procedures such as PROC IML. (Recall
Recently Sanjay Matange blogged about how to color the bars of a histogram according to a gradient color ramp. Using the fact that bar charts and histograms look similar, he showed how to use PROC SGPLOT in SAS to plot a bar chart in which each bar is colored according
A SAS customer asked: Why isn't the chi-square distribution supported in PROC UNIVARIATE? That is an excellent question. I remember asking a similar question when I first started learning SAS. In addition to the chi-square distribution, I wondered why the UNIVARIATE procedure does not support the F distribution. These are
Last week my colleague Chris Hemedinger published a blog post that described how to use the ODS LAYOUT GRIDDED statement to arrange tables and graphs in a panel. The statement was introduced in SAS 9.4m1 (December 2013). Gridded layout is supported for HTML, POWERPOINT, and the PRINTER family of destinations
When creating a statistical graphic such as a line plot or a scatter plot, it is sometimes important to preserve the aspect ratio of the data. For example, if the ranges of the X and Y variables are equal, it can be useful to display the data in a square
A matrix is a convenient way to store an array of numbers. However, often you need to extract certain elements from a matrix. The SAS/IML language supports two ways to extract elements: by using subscripts or by using indices. Use subscripts when you are extracting a rectangular portion of a
Sometimes you are writing a program that needs to find out whether a particular SAS product (like SAS/ETS, SAS/QC, or SAS/OR) is licensed. I was reminded of this fact when I wrote last week's blog post about how to create a map with PROC SGPLOT. Although the SGPLOT procedure is
Did you know that you can use the POLYGON statement in PROC SGPLOT to draw a map? The graph at the left shows the 48 contiguous states of the US, overlaid with markers that indicate the locations of major cities. The plot was created by using the POLYGON statement, which
In many procedures, the ID statement is used to identify observations by specifying an identifying variable, such as a name or a patient ID. In many regression procedures, you can specify multiple ID variables, and all variables are copied into output data sets that contain observation-wise statistics such as predicted
How much does this big pumpkin weigh? One of the cafeterias at SAS invited patrons to post their guesses on an internal social network at SAS. There was no prize for the correct guess; it was just a fun Halloween-week activity. I recognized this as an opportunity to apply the
In SAS, the DATA step and PROC SQL support mnemonic logical operators. The Boolean operators AND, OR, and NOT are used for evaluating logical expressions. The comparison operators are EQ (equal), NE (not equal), GT (greater than), LT (less than), GE (greater than or equal), and LE (less than or