My presentation at SAS Global Forum 2016 was titled "Writing Packages: A New Way to Distribute and Use SAS/IML Programs." The paper was published in the conference proceedings several months ago, but I recently recorded a short video that gives an overview of using and creating packages in SAS/IML 14.1:
Author
In a scatter plot, the regions where observations are packed tightly are areas of high density. A contour plot or heat map of a bivariate kernel density estimate (KDE) is one way to visualize regions of high density. A SAS customer asked whether it is possible to use SAS to
This week Hillary Clinton became the first woman to be nominated for president of the US by a major political party. Although this is a first for the US, many other countries have already passed this milestone. In fact, 60 countries have already elected women as presidents and prime ministers.
A kernel density estimate (KDE) is a nonparametric estimate for the density of a data sample. A KDE can help an analyst determine how to model the data: Does the KDE look like a normal curve? Like a mixture of normals? Is there evidence of outliers in the data? In
Last week I read an interesting paper by Bob Rodriguez: "Statistical Model Building for Large, Complex Data: Five New Directions in SAS/STAT Software." In it, Rodriguez summarizes five modern techniques for building predictive models and highlights recent SAS/STAT procedures that implement those techniques. The paper discusses the following high-performance (HP)
I'm addicted to you. You're a hard habit to break. Such a hard habit to break. — Chicago, "Hard Habit To Break" Habits are hard to break. For more than 20 years I've been putting semicolons at the end of programming statements in SAS, C/C++, and Java/Javascript. But lately I've been
One of my favorite new features in PROC SGPLOT in SAS 9.4m2 is addition of the COLORRESPONSE= and COLORMODEL= options to the SCATTER statement. By using these options, it is easy to color markers in a scatter plot so that the colors indicate the values of a continuous third variable.
Last week I showed how to represent a Markov transition matrix in the SAS/IML matrix language. I also showed how to use matrix multiplication to iterate a state vector, thereby producing a discrete-time forecast of the state of the Markov chain system. This article shows that the expected behavior of
Two of my favorite string-manipulation functions in the SAS DATA step are the COUNTW function and the SCAN function. The COUNTW function counts the number of words in a long string of text. Here "word" means a substring that is delimited by special characters, such as a space character, a
Many computations in elementary probability assume that the probability of an event is independent of previous trials. For example, if you toss a coin twice, the probability of observing "heads" on the second toss does not depend on the result of the first toss. However, there are situations in which
Last week I blogged about how to draw the Cantor function in SAS. The Cantor function is used in mathematics as a pathological example of a function that is constant almost everywhere yet somehow manages to "climb upwards," thus earning the nickname "the devil's staircase." The Cantor function has three
I was a freshman in college the first time I saw the Cantor middle-thirds set and the related Cantor "Devil's staircase" function. (Shown at left.) These constructions expanded my mind and led me to study fractals, real analysis, topology, and other mathematical areas. The Cantor function and the Cantor middle-thirds
'Tis a gift to be simple. -- Shaker hymn In June 2015 I published a short article for Significance, a magazine that features statistical and data-related articles that are of general interest to a wide a range of scientists. The title of my article is "In Praise of Simple Graphics."
Graphs enable you to visualize how the predicted values for a regression model depend on the model effects. You can gain an intuitive understanding of a model by using the EFFECTPLOT statement in SAS to create graphs like the one shown at the top of this article. Many SAS regression
Every beginning SAS programmer learns the simple IF-THEN/ELSE statement for conditional processing in the SAS DATA step. The basic If-THEN statement handles two cases: if a condition is true, the program does one thing, otherwise the program does something else. Of course, you can handle more cases by using multiple
I have previously shown how to overlay basic plots on box plots when all plots share a common discrete X axis. It is interesting to note that box plots can also be overlaid on a continuous (interval) axis. You often need to bin the data before you create the plot.
Box plots summarize the distribution of a continuous variable. You can display multiple box plots in a single graph by specifying a categorical variable. The resulting graph shows the distribution of subpopulations, such as different experimental groups. In the SGPLOT procedure, you can use the CATEGORY= option on the VBOX
Last week I discussed how to create spaghetti plots in SAS. A spaghetti plot is a type of line plot that contains many lines. Spaghetti plots are used in longitudinal studies to show trends among individual subjects, which can be patients, hospitals, companies, states, or countries. I showed ways to
I got several positive comments about a recent tip, "How to fit a variety of logistic regression models in SAS." A reader asked if I knew any other similar resources about statistical analysis in SAS. Absolutely! One gem that comes to mind is "Examples of writing CONTRAST and ESTIMATE statements."
What is a spaghetti plot? Spaghetti plots are line plots that involve many overlapping lines. Like spaghetti on your plate, they can be hard to unravel, yet for many analysts they are a delicious staple of data visualization. This article presents the good, the bad, and the messy about spaghetti
A grid is a set of evenly spaced points. You can use SAS to create a grid of points on an interval, in a rectangular region in the plane, or even in higher-dimensional regions like the parallelepiped shown at the left, which is generated by three vectors. You can use
Children in primary school learn that every positive number has a real square root. The number x is a square root of s, if x2 = s. Did you know that matrices can also have square roots? For certain matrices S, you can find another matrix X such that X*X
SAS software can fit many different kinds of regression models. In fact a common question on the SAS Support Communities is "how do I fit a <name> regression model in SAS?" And within that category, the most frequent questions involve how to fit various logistic regression models in SAS. There
I was eleven years old when I first saw Newton's method. No, I didn't go to a school for geniuses. I didn't even know it was Newton's method until decades later. However, in sixth grade I learned an iterative algorithm that taught me (almost) everything I need to know about
When I was in the sixth grade, I learned an iterative procedure for computing square roots by hand. Yes, I said by hand. Scientific calculators with a square root key were not yet widely available, so I and previous generations of children suffered through learning to calculate square roots by
Optimization is a primary tool of computational statistics. SAS/IML software provides a suite of nonlinear optimizers that makes it easy to find an optimum for a user-defined objective function. You can perform unconstrained optimization, or define linear or nonlinear constraints for constrained optimization. Over the years I have seen many
Last week I analyzed 12 million records of taxi cab transactions in New York City. As part of that analysis, I used a DATA step view to create a new variable, which was the ratio of the tip amount to the fare amount. A novice SAS programmer told me that
In a previous post I showed how to download, install, and use packages in SAS/IML 14.1. SAS/IML packages incorporate source files, documentation, data sets, and sample programs into a ZIP file. The PACKAGE statement enables you to install, uninstall, and manage packages. You can load functions and data into your
When I read Robert Allison's article about the cost of a taxi ride in New York City, I was struck by the scatter plot (shown at right; click to enlarge) that plots the tip amount against the total bill for 12 million taxi rides. The graph clearly reveals diagonal and
My previous post highlighted presentations at SAS Global Forum 2016 that heavily used SAS/IML software. Several of the authors clearly want to share their work with the wider SAS analytical community. They include their SAS/IML program in an appendix or mention a web site or email address from which the