This post is rated R

22

This morning, as I was writing this blog post at the kitchen table, my 5-year old daughter ran into the room from watching “Sesame Street.” She excitedly announced, “Mommy, the letter of the day is R!” Too true.

The recent NY Times story on the R programming language, which included some quotes from me, has evoked spirited comments and feedback on blogs and message boards.

When a 30-minute interview is boiled down into a quote or two, some detail is understandably lost. I wanted to share some further thoughts and clarification.

First, SAS and I applaud the innovative contributions and passion of the R community, and users who apply R to solve problems. In a very real sense, we are grateful for R, as it provides a freely available venue for bleeding-edge and experimental data analysis methods, and underscores the increasing importance of advanced analytical and graphical methods in this age of massive data volumes.

Over the past three decades, SAS has made and continues to make many noteworthy contributions to advanced data computing. As a trusted supplier to a large and diverse set of organizations, SAS provides analysis software that has been refined over years of customer application and feedback. SAS software is fully supported and used daily in countless ways. SAS is also scalable to very large data sets (multi-threaded, grid-enabled, etc.). These issues remain very important to the organizations we serve.

The analysis community benefits from both SAS and R, and we view them as complementary. Many SAS customers use R. We can and do peacefully co-exist. Interest in analytics continues to grow. And the use of both SAS and R continue to grow as well, reflecting this trend.

As for open source and my airplane quote …

My remark reflects a key difference between R and SAS, that of support, reliability, and validation. Customers value SAS for many things, including our extensive testing, documentation, 24/7 support, and training. In contrast, the quality of proliferating R packages is varied and uneven, especially in complex analytical modules. Mistakes in these packages can lead to misleading results, even for experienced users.

The airplane comment was meant to point out this key difference. Not to condemn open source. In fact, SAS values open-source software. Our software runs on Linux. We use some open-source tools in development. And we plan to embrace open source further in the future.

The world has many complex problems. We advocate approaches based on science, on analysis to address these problems. Making more analytic methods readily available is a good thing. From SAS; from R; from the resourceful individuals who innovate with their tools of choice, regardless of the source.

Share

About Author

Anne Milley

Sr Director, Analytic Strategy, JMP

Anne oversees analytic strategy in JMP Product Marketing. She is a contributing faculty member for the International Institute of Analytics. She enjoys organic gardening and spending time with her family.

22 Comments

  1. I also thought the original comment about open source software and jet engines to be disrespectful; however I admire your decision to respond.
    Whenever open source software is discussed, proprietary competitors always seem to resort to fear, uncertainty, doubt. Please remember that customers are capable of making a rational decision FOR THE BEST INTEREST OF THE CUSTOMER without such marketing tactics.

  2. There surely is a role for everyone, but Anne's blog post is inaccurate on several counts.
    First: R processes pretty large data sets in memory. I work routinely with 80M records on a 4GB linux box. The package ff works out of memory on files of unlimited size, very much like SAS.
    Regarding parallelism: sure R is behind SAS here, but naive parallelism (e.g., SNOW) works fine. This is an area of active research.
    Regarding packages: Interestingly, I still have to find a buggy R package. Maybe I am lucky, or maybe Anne Milley should provide some examples, or better, some statistics.
    I disagree with the statement: "The analysis community benefits from both SAS and R, and we view them as complementary." The inconvenient truth is that, while a nice of users will always use SAS, SAS and R are mostly substitutes. I know many former SAS customer who are now using R.

  3. My experience with R is that it has fewer bugs than similar commercial software, such as Splus and STATGRAPHICS. You seem to be saying that R is less reliable than SAS. Can you give any examples, any evidence, to support this?
    Just as you argue that SAS is likely to be more reliable than R, I could argue the opposite: Many more people contribute to the reliability of R than to the reliability of SAS.

  4. I'm a statistician (newly graduated, working in non-clinical applications) who works in quality management and quality-reliability engineering at a medical device company. I am a SAS user, but I use 'R' as well (along with one or two other packages as the need arises). I'll offer a few more thoughts.
    As I understand it, the FDA employees who review clinical protocols are very familiar with SAS, and may not be so familiar with 'R'. If this is true, then using SAS to design and analyze results provides a familiar and common tool, which in turn may remove unnecessary obstacles to understanding - and hopefully accepting - clinical trial results. Regulatory review prior to commercialization is challenging enough - why make it unnecessarily harder?
    For non-clinical applications (my own area), experimental evidence is reviewed and sometimes challenged by external bodies (ISO, EU MDD, FDA, Health Canada, etc.). One potential challenge is the question "what evidence do we (the company) have that the output of the statistical software's algorithms is valid"? If we use the same package as the FDA, then this challenge might be more easily answered. Admittedly this assumes that the experiment was competently designed, executed, and analyzed with valid conclusions drawn. Where we do use another non-SAS package, we have documented evidence that the package returns the expected result for various statistical methods. Upgrades to statistical software packages are re-validated and documented.
    One more thought that I haven't seen addressed yet by the 'R' community is that of liability and recourse. If SAS were to release an algorithm that works improperly that leads to invalid results that place a company's customers and it business in jeopardy, then is it not true that SAS can be held liable? For open-source packages, who would we go to for recourse?
    A related note is that, while I haven't audited SAS for their quality management systems and software development lifecycle processes, I would expect that they have such systems that are documented and audited. (Think ISO 9001, ISO 12207, IEC 601, SW68, the IEEE SWEBOK, etc.) External audits of a company's processes offers its customers evidence that their processes and resulting products are effective and remain so. Such vendor audits are often a required part of doing business in the pharmaceutical and medical device industries. What about open source developers? What are their software development processes? Are they documented? Is there documented evidence from internal and external audits that these software development and validation processes are effective? Perhaps something like this exists for 'R' products, and as a novice 'R' user I'm simply unaware. Having said all this, here I'll admit that I don't know that SAS has any type of ISO certification (if they do have it, I was unable to find it on their website).
    If I've gotten any facts askew, I'll happily stand corrected.
    Both packages (SAS and R) are valuable. I appreciate having both to use when the need arises.

  5. Cliff, you might want to read "R: Regulatory Compliance and Validation Issues: A Guidance Document for the Use of R in Regulated Clinical
    Trial Environments", located at http://www.r-project.org/doc/R-FDA.pdf to correct some of your misconceptions.
    You should also note that, as with R, SAS makes no legal claim about the accuracy of its results - and it can not be held liable:
    ""EXCEPT WHERE EXPRESSLY PROVIDED OTHERWISE IN AN AGREEMENT BETWEEN YOU AND SAS, ALL INFORMATION, SOFTWARE, PRODUCTS AND SERVICES ARE PROVIDED
    "AS IS" WITHOUT WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.""
    If you are using SAS or R in a critical environment, you should be performing your own testing of both. Anecdotal evidence does not support your belief that SAS has fewer bugs than R - particularly in areas of active research (such as generalised linear mixed models), SAS tends to present its results as the gold-standard, when in fact there is still much disagreement between experts as to what the correct approach is.

  6. Hadley,
    Thanks for the additional information. It's good to know that some of these issues have been considered. (It's obvious that I'm no lawyer.)
    Is there more information on the software development lifecycle practices used by 'R' developers? Both the base package, and any add-on packages? And where are the documented results of their development and verification/validation activities kept? Who audits these, and where are the audit results available?
    I ask these questions not to be difficult, but rather because they represent good business and development practices (referring to the field of software engineering), and are often requirements for customers of a software product. The reference to "R: Regulatory Compliance and Validation Issues: A Guidance Document for the Use of R in Regulated Clinical Trial Environments" seems like a good start. I look forward to reading it.
    Again, many thanks for the information.

  7. It's very much up to the individual package developer. Some have great development practices, some not so great. Generally, all base packages have extensive test suites that you can evaluate and run yourself (of course these are also run as part of the release procedure for R).
    I think I could ask the same questions about SAS - where are the documented results of SAS's development, verification and validation procedures kept?
    One big advantage of open source code is that it's (at least theoretically) possible to inspect the code of every routine and verify it yourself. In practice this isn't so easy, as it requires you to have some familiarity with the code, and to understand how all the pieces fit together, but at least the possibility exists.

  8. While the conversation about accuracy is interesting, we should not lose sight of the reason why this discussion is going on in the first place. As a marketing professional, Ann Milley should know better than to say anything about the competition that would invite a backlash. As the old saying goes, "if you don't have anything nice to say, don't say anything."
    The proper response from her should have been to point out the issues related to accuracy and validation and how SAS itself deals with those issues. Taking a swipe at R with the airplane quote is a cheap shot at best. At worst, she has succeeded in insulting her own clients who are busy using R and SAS in a complementary way. By that I mean, they are using R where SAS falls short and vice versa. She owes them a big apology. Not here on the SAS blog but in a letter to the New York Times.
    As a bit of advice to SAS and its staff, if you want to stay relevant and keep your clients happy, stop being so arrogant.

  9. It’s great to see so many people sharing their thoughts and views regarding data analysis, statistical software development, and open source on this and other blogs and message boards. We are listening with great interest to you and are learning more about your needs and preferences. Thanks for the feedback, and please keep it coming.
    Hindsight is, of course, 20/20. I shouldn't have made the airplane comment and am sorry for it. Several have called SAS arrogant and aloof. That has not been my experience in more than 20 years of working with SAS--as a student, as a customer and now as a SAS employee. One of the best things for me about working at SAS is interacting with so many smart people – customers, experts and colleagues -- who care deeply about the quality of the analysis and the end result. We may not always agree, but the discussion, debate, testing and validation of ideas are enormously valuable.

  10. Let me emphasize that the hurdles to the use of R for clinical trials are shrinking. There has been substantive activity over the past several years, both internally at the FDA and within the R community to increase R's acceptance in this domain.
    At the Joint Statistical Meetings in 2006, Sue Bell from the FDA spoke during a session with a presentation entitled Times 'R' A Changing: FDA Perspectives on Use of "Open Source". A copy of this presentation is available here:
    http://www.fda.gov/cder/Offices/Biostatistics/Bell.pdf
    In 2007, during an FDA committee meeting reviewing the safety profile of Avandia (Rosiglitazone), the internal FDA meta-analysis performed by Joy Mele, the FDA statistician, was done using R. A copy of this presentation is available here:
    http://www.fda.gov/ohrms/dockets/ac/07/slides/2007-4308s1-05-fda-mele.ppt
    Given the high profile nature of drug safety issues today, that R was used for this analysis by the FDA itself speaks volumes.
    Also in 2007, at the annual R user meeting at Iowa State University, I had the pleasure and privilege of Chairing a session on the use of R for clinical trials. The speakers included Frank Harrell (Chair of the Dept of Biostatistics at Vanderbilt University), Tony Rossini and David James (Novartis Pharmaceuticals) and Mat Soukup (FDA statistician). Copies of our presentations are available here, a little more than half way down the page:
    http://user2007.org/program/
    At that meeting, we also introduced a document that has been updated since then and approved formally by the R Foundation for Statistical Computing. The document provides guidance for the use of R in the regulated clinical trials domain, addresses R's compliance with the relevant regulations (eg. 21 CFR 11) as well as describing the development, testing and quality processes in place for R, also known as the Software Development Life Cycle.
    That document (which Hadley also referenced) is available here:
    http://www.r-project.org/doc/R-FDA.pdf
    I have heard directly from colleagues in industry that this document has provided significant value in their internal discussions regarding implementing the use of R within their respective environments and assuaging many fears regarding R's use.
    Additionally, presentations regarding the use of open source software and R specifically for clinical trials have been made at DIA and other industry meetings. This fall, there is a session on the use of R scheduled for the FDA's Industry Statistics Workshop in Washington, D.C.
    For those unfamiliar, I would also point out the membership and financial donors to the R Foundation for Statistical Computing and take note of the plethora of large pharma companies and clinical research institutions:
    http://www.r-project.org/foundation/memberlist.html
    The use of R within this domain is increasing and will only continue to progress as R's value becomes increasingly clear to even risk averse industry decision makers.

  11. Great thread and love to see the interactions. For those particularly interested in the life sciences/pharmaceutical dimension of this discussion, I have a post on the SAS HLS blog that discusses this in more detail.
    http://blogs.sas.com/hls
    Best,
    Jason

  12. You're all missing the point. SAS, like most software companies (Wolfram Research, Microsoft, etc.), is having its business model critically threatened by high-quality free software packages such as R.
    So of course, they'll spread plenty of FUD (which they may sincerely believe), especially about quality. They will feed into the widespread public perception that you get what you pay for and that free software is hobbyist and "amateur." And indeed, there are many free software packages that are low-quality and completely unready for prime-time.
    In some creative industries, such as carpentry, home-made tools and mass-produced ones have reached an uneasy truce, where each one wins out in certain niches. For example, most people are not going to build their own table saw for safety purposes... but will build their own sleds and jigs to use with it.
    My recommendation to SAS would be to embrace and extend R. Make the next versions of SAS software able to use community-generated R modules as-is, and provide additional primitives to get at SAS-proprietary features so that module writers start targeting SAS and more R code becomes SAS-specific. This would satisfy the community's need to create custom software modules, tap into the large and growing library of R code, but still protect SAS's core engine and software business model.

  13. R is licensed under the GPL http://www.gnu.org/copyleft/gpl.html so it would be tricky for SAS to embrace and extend the R modules the way you recommend. They would need to completely sandbox the interaction, which even then may not be enough, to quote the GPL faq:
    ""
    Where's the line between two separate programs, and one program with two parts? This is a legal question, which ultimately judges will decide. We believe that a proper criterion depends both on the mechanism of communication (exec, pipes, rpc, function calls within a shared address space, etc.) and the semantics of the communication (what kinds of information are interchanged).
    If the modules are included in the same executable file, they are definitely combined in one program. If modules are designed to run linked together in a shared address space, that almost surely means combining them into one program.
    By contrast, pipes, sockets and command-line arguments are communication mechanisms normally used between two separate programs. So when they are used for communication, the modules normally are separate programs. But if the semantics of the communication are intimate enough, exchanging complex internal data structures, that too could be a basis to consider the two parts as combined into a larger program.
    """

  14. Alison Bolen on

    Thanks for the comment, Gabriel. You raise some interesting points. I've looked into it and can assure you there's no plans to inter-mingle SAS source code and R source code. We are making it possible to combine R and SAS programming statements into a single program, but we are not doing anything to extend or modify the R code and we are not mixing SAS and R executables or source code. The SAS development team has worked closely with our legal team to make sure that we're following all of the guidelines to protect both SAS and R.

  15. It is all about job security. So far the market demand for R developers is smaller than that for SAS programmer. So here I can see many R ers are very bitter to SAS. Meanwhile SAS see sale drop because of R so it tries to belittle R.
    two thoughts,
    Isn't a good thing for SAS programmer to see school now teach less SAS and more R? let's face it. SAS isn't going way any time soon and school produce less and less SAS programmers, then the exisiting SAS programers become.....?
    I feel comfortable with R's role in academia. it seems to ask for too much from a open source tool to mess up with compliance. why not just have R do all the funcy stuff and let SAS take care of the ugly thing? 🙂

  16. Re: the comments about R developers being less sought after than SAS people.
    I don't know what field you're working in, but when I finished my PhD in physics a couple of years ago I went straight into a hedge fund. The fact that I had experience with R was a reason why I had so many offers even in the midst of a severe recession.
    Perhaps the life and social sciences are different, but if you're a physicist, mathematician, or applied mathematician with lots of experience in R, you're going to be pretty hot property in the job market, particularly if you intend on going into finance.

  17. Pingback: The Many Faces of R - A Shot in the Arm

  18. What about this thought: SAS is a - platform / eco-system - joining all sorts of proprietary and open source solutions. R is a - software - which is part of a solution platform / eco-system. This would help explain the idea of SAS and R being able to exist in harmony.

Back to Top