The use of SAS/IML software in published research

2

SAS/IML software is used by many SAS programmers, primarily for creating custom algorithms and macros that implement statistical analyses that are not built into any SAS procedure. I know that PROC IML is used regularly by pharmaceutical companies, by the financial and insurance industries, and by researchers in medical colleges and business schools, among others.

However, when I was recently asked how many research papers are published each year that use SAS/IML, I had no idea. I conjectured "a few dozen," figuring that the most SAS/IML programmers work in a corporate setting or in government, and that these researchers are less likely than academics to publish their results in journals.

Over the weekend I used Google Scholar to try to get a better estimate. I constructed the following query for Google Scholar:

   SAS +IML OR "proc iml" -site:sas.com -author:wicklin

The query omits any papers written by me or that appear on the sas.com domain. The idea was to exclude any white papers, conference proceedings, or marketing material that is created or hosted at SAS. The results also exclude articles posted to the SAS/IML File Exchange.

The results surprised me: I was wrong by an order of magnitude! I clicked on the links for a few dozen publications to ascertain how many of the hits were false positives. For example, an article that says "we decided not to use SAS/IML in this study," would appear on the list, even though the authors did not actually use the programming language. There were a few false positives and there were a few SAS manuals in the list. However, the vast majority of the Google list consisted of scholarly journal articles, books, or conference proceedings that used SAS/IML in a nontrivial way.

I encourage you to submit the query yourself and to look at the variety of applications and the wide range of journals. More than 2,500 entries were published prior to 1995. In addition to those papers, the following SAS DATA step gives the number of Google Scholar entries for the SAS/IML query for the past 20 years:

/* Results from Google Scholar. Downloaded 6/28/2015 
   "sas" +IML OR "proc iml" -site:sas.com -author:wicklin
   There were 2510 results when year <= 1994 
*/
data IMLPub;
input Year Publications;
datalines;
1995 231  1996 255  1997 311  1998 301
1999 299  2000 355  2001 361  2002 424
2003 465  2004 493  2005 543  2006 568
2007 526  2008 611  2009 633  2010 580
2011 589  2012 512  2013 603  2014 552
;
 
title "Number of Publications that Mention SAS/IML Software";
title2 "Data from Google Scholar";
proc sgplot data=IMLPub;
   series x=Year y=Publications / markers;
   yaxis min=0 grid values=(0 to 600 by 100) valueshint;
   xaxis grid;
run;
imlscholar

The graph appears to increase until about 2005, and has been approximately constant since then. In the past 10 years, Google Scholar reports about 550 publications per year. I had no idea that the number was that high.

I do not advocate using internet search engines to rank software based on the number of web sites that mention the software. I have argued that using the number of search results as a proxy for popularity is of dubious value and is fraught with statistical perils.

However, I found it intellectually interesting to read the titles and excerpts of the scholarly publications that mention SAS/IML software. Browse the list yourself, or see the list that includes papers on the sas.com domain for a more complete perspective.

If you are thinking about using SAS/IML software in your next research project, you might want to search Google Scholar first. Someone else might have already written a scholarly paper that solves your problem!

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

2 Comments

  1. Pingback: Video: Writing packages: A new way to distribute and use SAS/IML programs - The DO Loop

Leave A Reply

Back to Top