In a previous blog, I showed how to use SAS/IML subscript reduction operators to compute the location of the maximum values for each row of a matrix. The subscript reduction operators are useful for computing simple statistics for each row (or column) of a numerical matrix. If x is a
Author
The other day I encountered an article in the SAS Knowledge Base that shows how to write a macro that "returns the variable name that contains the maximum or minimum value across an observation." Some people might say that the macro is "clever." I say it is complicated. This is
I've been a fan of statistical simulation and other kinds of computer experimentation for many years. For me, simulation is a good way to understand how the world of statistics works, and to formulate and test conjectures. Last week, while investigating the efficiency of the power method for finding dominant
One of the first skills that a beginning SAS/IML programmer learns is how to read data from a SAS data set into SAS/IML vectors. (Alternatively, you can read data into a matrix). The beginner is sometimes confused about the syntax of the READ statement: do you specify the names of
When I was at SAS Global Forum last week, a SAS user asked my advice regarding a SAS/IML program that he wrote. One step of the program was taking too long to run and he wondered if I could suggest a way to speed it up. The long-running step was
In statistical programming, I often test a program by running it on a problem for which I know the correct answer. I often use a single expression to compute the maximum value of the absolute difference between the vectors: maxDiff = max( abs( z-correct ) ); /* largest absolute difference
A reader asked: I want to create a vector as follows. Suppose there are two given vectors x=[A B C] and f=[1 2 3]. Here f indicates the frequency vector. I hope to generate a vector c=[A B B C C C]. I am trying to use the REPEAT function
To a statistician, the DIF function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function has many other uses, including computing finite differences. The DIF function computes the difference between the original vector and a shifted version
To a statistician, the LAG function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function provides a convenient way to compute quantitites that involve adjacent values in any vector. The LAG function is essentially a "shift operator."
I blog about a lot of topics, but the following five categories represent some of my favorite subjects. Judging by the number of readers and comments, these articles have struck a chord with SAS users. If you haven't read them, check them out. (If you HAVE read them, some are
SAS software provides many run-time functions that you can call from your SAS/IML or DATA step programs. The SAS/IML language has several hundred built-in statistical functions, and Base SAS software contains hundreds more. However, it is common for statistical programmers to extend the run-time library to include special user-defined functions.
Because the SAS/IML language is a general purpose programming language, it doesn't have a BY statement like most other SAS procedures (such as PROC REG). However, there are several ways to loop over categorical variables and perform an analysis on the observations in each category. One way is to use
Last week I discussed how to fit a Poisson distribution to data. The technique, which involves using the GENMOD procedure, produces a table of some goodness-of-fit statistics, but I find it useful to also produce a graph that indicates the goodness of fit. For continuous distributions, the quantile-quantile (Q-Q) plot
Last week I blogged about how to construct a smoother for a time series for the temperature in Albany, NY from 1995 to March, 2012. I smoothed the data by "folding" the time series into a single "year" that contains repeated measurements for each day of the year. Experts in
The birthday matching problem is a classic problem in probability theory. The part of it that people tend to remember is that in a room of 23 people, there is greater than 50% chance that two people in the room share a birthday. But the birthday matching problem is also
In yesterday's post, I discussed a "quick and dirty" method to smooth periodic data. However, after I smoothed the data I remarked that the smoother itself was not exactly periodic. At the end points of the periodic interval, the smoother did not have equal slopes and the method does not
Over at the SAS and R blog, Ken Kleinman discussed using polar coordinates to plot time series data for multiple years. The time series plot was reproduced in SAS by my colleague Robert Allison. The idea of plotting periodic data on a circle is not new. In fact it goes
Over at the SAS Discussion Forums, someone asked how to use SAS to fit a Poisson distribution to data. The questioner asked how to fit the distribution but also how to overlay the fitted density on the data and to create a quantile-quantile (Q-Q) plot. The questioner mentioned that the
Locating missing values is important in statistical data analysis. I've previously written about how to count the number of missing values for each variable in a data set. In Base SAS, I showed how to use the MEANS or FREQ procedures to count missing values. In the SAS/IML language, I
In a previous post I showed how to implement Stewart's (1980) algorithm for generating random orthogonal matrices in SAS/IML software. By using the algorithm, it is easy to generate a random matrix that contains a specified set of eigenvalues. If D = diag(λ1, ..., λp) is a diagonal matrix and
Because I am writing a new book about simulating data in SAS, I have been doing a lot of reading and research about how to simulate various quantities. Random integers? Check! Random univariate samples? Check! Random multivariate samples? Check! Recently I've been researching how to generate random matrices. I've blogged
SAS Global Forum 2012 is right around the corner. If you will be in Orlando, too, be sure to say hello! If you have ideas for improving SAS/IML software or you would like to discuss my blog, please visit me during my hours at the SAS/IML booth in the Demo
The fundamental units in the SAS/IML language are matrices and vectors. Consequently, you might wonder about conditional expression such as if v>0 then.... What does this expression mean when v contains more than a single element? Evaluating vector expressions When you test a vector for some condition, expressions like v>0
After my post on detecting outliers in multivariate data in SAS by using the MCD method, Peter Flom commented "when there are a bunch of dimensions, every data point is an outlier" and remarked on the curse of dimensionality. What he meant is that most points in a high-dimensional cloud
Covariance, correlation, and distance matrices are a few examples of symmetric matrices that are frequently encountered in statistics. When you create a symmetric matrix, you only need to specify the lower triangular portion of the matrix. The VECH and SQRVECH functions, which were introduced in SAS/IML 9.3, are two functions
The SAS/IML language supports both row vectors and column vectors. This is useful for performing linear algebra, but it can cause headaches when you are writing a SAS/IML module. I want my modules to be able to handle both row vectors and column vectors. I don't want the user to
A recent discussion on the SAS-L discussion forum concerned how to implement linear interpolation in SAS. Some people suggested using PROC EXPAND in SAS/ETS software, whereas others proposed a DATA step solution. For me, the SAS/IML language provides a natural programming environment to implement an interpolation scheme. It also provides
Most statistical programmers have seen a graph of a normal distribution that approximates a binomial distribution. The figure is often accompanied by a statement that gives guidelines for when the approximation is valid. For example, if the binomial distribution describes an experiment with n trials and the probability of success
SAS provides several ways to compute sample quantiles of data. The UNIVARIATE procedure can compute quantiles (also called percentiles), but you can also compute them in the SAS/IML language. Prior to SAS/IML 9.22 (released in 2010) statistical programmers could call a SAS/IML module that computes sample quantiles. With the release
In the United States, this upcoming weekend is when we turn our clocks forward one hour as we adopt daylight saving time. (Some people will also flip their mattresses this weekend!) Daylight saving time (DST) in the US begins on the second Sunday in March and ends on the first