Weighted averages are all around us. Teachers use weighted averages to assign a test more weight than a quiz. Schools use weighted averages to compute grade-point averages. Financial companies compute the return on a portfolio as a weighted average of the component assets. Financial charts show (linearly) weighted moving averages
Author
I wrote 114 posts for The DO Loop blog in 2015. Which were the most popular with readers? In general, highly technical articles appeal to only a small group of readers, whereas less technical articles appeal to a larger audience. Consequently, many of my popular articles were related to data
Lo how a rose e'er blooming From tender stem hath sprung As I write this blog post, a radio station is playing Chrismas music. One of my favorite Christmas songs is the old German hymn that many of us know as "Lo, How a Rose E're Blooming." I was humming
The most recent development environment for SAS programmers is SAS Studio, which is a browser-based application. The free SAS University Edition, which includes SAS/IML software, also uses SAS Studio as a development environment. SAS Studio has a special mode for programmers who use interactive procedures such as PROC IML. (Recall
Recently Sanjay Matange blogged about how to color the bars of a histogram according to a gradient color ramp. Using the fact that bar charts and histograms look similar, he showed how to use PROC SGPLOT in SAS to plot a bar chart in which each bar is colored according
A SAS customer asked: Why isn't the chi-square distribution supported in PROC UNIVARIATE? That is an excellent question. I remember asking a similar question when I first started learning SAS. In addition to the chi-square distribution, I wondered why the UNIVARIATE procedure does not support the F distribution. These are
Last week my colleague Chris Hemedinger published a blog post that described how to use the ODS LAYOUT GRIDDED statement to arrange tables and graphs in a panel. The statement was introduced in SAS 9.4m1 (December 2013). Gridded layout is supported for HTML, POWERPOINT, and the PRINTER family of destinations
When creating a statistical graphic such as a line plot or a scatter plot, it is sometimes important to preserve the aspect ratio of the data. For example, if the ranges of the X and Y variables are equal, it can be useful to display the data in a square
A matrix is a convenient way to store an array of numbers. However, often you need to extract certain elements from a matrix. The SAS/IML language supports two ways to extract elements: by using subscripts or by using indices. Use subscripts when you are extracting a rectangular portion of a
Sometimes you are writing a program that needs to find out whether a particular SAS product (like SAS/ETS, SAS/QC, or SAS/OR) is licensed. I was reminded of this fact when I wrote last week's blog post about how to create a map with PROC SGPLOT. Although the SGPLOT procedure is
Did you know that you can use the POLYGON statement in PROC SGPLOT to draw a map? The graph at the left shows the 48 contiguous states of the US, overlaid with markers that indicate the locations of major cities. The plot was created by using the POLYGON statement, which
In many procedures, the ID statement is used to identify observations by specifying an identifying variable, such as a name or a patient ID. In many regression procedures, you can specify multiple ID variables, and all variables are copied into output data sets that contain observation-wise statistics such as predicted
How much does this big pumpkin weigh? One of the cafeterias at SAS invited patrons to post their guesses on an internal social network at SAS. There was no prize for the correct guess; it was just a fun Halloween-week activity. I recognized this as an opportunity to apply the
In SAS, the DATA step and PROC SQL support mnemonic logical operators. The Boolean operators AND, OR, and NOT are used for evaluating logical expressions. The comparison operators are EQ (equal), NE (not equal), GT (greater than), LT (less than), GE (greater than or equal), and LE (less than or
Statistical programmers often need to evaluate complicated expressions that contain square roots, logarithms, and other functions whose domain is restricted. Similarly, you might need to evaluate a rational expression in which the denominator of the expression can be zero. In these cases, it is important to avoid evaluating a function
Sometimes I can't remember where I put things. If I lose my glasses or garden tools, I am out of luck. But when I can't remember where I put some data, I have SAS to help me find it. When I can remember the name of the data set, my
Every year near Halloween I write a trick-and-treat article in which I demonstrate a simple programming trick that is a real treat to use. This year's trick features two of my favorite functions, the CUSUM function and the LAG function. By using these function, you can compute the rows of
The FREQ procedure in SAS supports computing exact p-values for many statistical tests. For small and mid-sized problems, the procedure runs very quickly. However, even though PROC FREQ uses efficient methods to avoid unnecessary computations, the computational time required by exact tests might be prohibitively expensive for certain tables. If
Did you know that the FREQ procedure in SAS can compute exact p-values for more than 20 statistical tests and statistics that are associated with contingency table? Mamma mia! That's a veritable smorgasbord of options! Some of the tests are specifically for one-way tables or 2 x 2 tables, but many apply
How do you simulate a contingency table that has a specified row and column sum? Last week I showed how to simulate a random 2 x 2 contingency table when the marginal frequencies are specified. This article generalizes to random r x c frequency tables (also called cross-tabulations) that have the same marginal row
A recent question posted on a discussion forum discussed storing the strictly upper-triangular portion of a correlation matrix. Suppose that you have a correlation matrix like the following: proc iml; corr = {1.0 0.6 0.5 0.4, 0.6 1.0 0.3 0.2, 0.5 0.3 1.0 0.1, 0.4 0.2 0.1 1.0}; Every correlation
When modeling and simulating data, it is important to be able to articulate the real-life statistical process that generates the data. Suppose a friend says to you, "I want to simulate two random correlated variables, X and Y." Usually this means that he wants data generated from a multivariate distribution,
This article shows how to visualize a surface in SAS. You can use the SURFACEPLOTPARM statement in the Graph Template Language (GTL) to create a surface plot. But don't worry, you don't need to know anything about GTL: just copy the code in this article and replace the names of
Suppose that you are tabulating the eye colors of students in a small class (following Friendly, 1992). Depending upon the ethnic groups of these students, you might not observe any green-eyed students. How do you put a 0 into the table that summarizes the number of students who have each
You can use SAS to generate random integers between 1–10 or in the range 1–100. This article shows how to generate random integers as easily as Excel does. I was recently talking with some SAS customers and I was asked "Why can't SAS create an easy way to generate random
In a previous post I described how to simulate random samples from an urn that contains colored balls. The previous article described the case where the balls can be either of two colors. In that csae, all the distributions are univariate. In this article I examine the case where the
If not for probability theory, urns would appear only in funeral homes and anthologies of British poetry. But in probability and statistics, urns are ever present and contain colored balls. The removal and inspection of colored balls from an urn is a classic way to demonstrate probability, sampling, variation, and
You've had a long day. You've implemented a custom algorithm in the SAS/IML language. But before you go home, you want to generate some matrices and test your program. If you are like me, you prefer a short statement—one line would be best. However, you also want the flexibility to
Occasionally a SAS statistical programmer will ask me, "How can I construct a large correlation matrix?" Often they are simulating data with SAS or developing a matrix algorithm that involves a correlation matrix. Typically they want a correlation matrix that is too large to input by hand, such as a
Dear Rick, I have a data set with 1,001 numerical variables. One variable is the response, the others are explanatory variable. How can I read the 1,000 explanatory variables into an IML matrix without typing every name? That's a good question. You need to be able to perform two sub-tasks: