Several times a year, I am contacted by a SAS account manager who tells me that a customer has asked whether it is possible to convert a MATLAB program to the SAS/IML language. Often the customer has an existing MATLAB program and wants to include the computation as part of a large SAS job. The customer wants to know if translating the MATLAB code is feasible, and whether it will be easy or hard.
The answer is always "it depends." It depends on the experience of the programmer and his familiarity with each language. It depends on the complexity of the program. It depends on whether the program calls MATLAB run-time functions and uses language features that do not have counterparts in the SAS/IML language.
Recently, I myself had the opportunity to rewrite a MATLAB program in PROC IML. This project did not involve any customer program or customer data, so I can blog about my experience.
How to Compute Cuzick's Test for Trend in SAS
My manager forwarded me a question that had been posted on an internal SAS discussion list. The question was "How do you compute Cuzick's test for trend in SAS?" My manager asked if I could implement it in PROC IML.
I had never heard of this test, but after a quick internet search I found a description of Cuzick's test. The formulas seemed straightforward, so I told my manager that I would write a SAS/IML program.
Before I started programming, I did another internet search. I started to type "Cuzick's test SAS" into a search engine to see if someone had already written such a program, but as I typed the first two words, the search engine suggested an autocompletion term. The term it proposed was "MATLAB."
Intrigued, I followed the link and discovered that Giuseppe Cardillo had written a MATLAB program to compute Cuzick's test for trend. Furthermore, Dr. Cardillo had granted permission to redistribute and use this code, with or without modification.
I was thrilled. Although it has been 20 years since I used MATLAB in graduate school, I know that the basic syntax of the MATLAB language is very similar to the SAS/IML language. With the help of MATLAB's online documentation, I thought I could whip out this assignment relatively quickly.
I was right. By the end of the day I had implemented a working version of Cuzick's test in PROC IML. Along the way, I encountered the following issues:
- The bulk of the translation requires a straightforward conversion of syntax. I wrote a very short MATLAB to SAS/IML cheat sheet to help remind myself of the syntactic differences between the languages.
- Some MATLAB functions, which appear to have direct counterparts in PROC IML, actually behave differently. For example, in MATLAB, the SUM function operates on each column of a matrix, whereas in PROC IML you get the sum of all the elements.
- Dr. Cardillo's code called the MATLAB TIEDRANK function, which returns two arguments. The first argument contains the averaged ranks for the input data, so I can use the SAS/IML RANKTIE function for that. However, the MATLAB documentation says that the second returned argument contains "an adjustment for ties required by the nonparametric tests," but I couldn't find any formulas for how to compute these adjustments. After an hour of fruitless searching, I finally looked up Cuzick's original paper. I was then able to write a SAS/IML module, TiedRankAdj that computes the adjustment for ties.
You can download my SAS/IML implementation of Cuzick's test and compare it with Dr. Cardillo's version.
Thoughts and Reflections
The exercise was interesting and worthwhile. Yes, you can convert a MATLAB program into PROC IML, provided that you have sufficient knowledge of each language. Whether the process is "easy" depends on the complexity of the program and on the programmer's knowledge of each language.
I learned several things from this experience:
- Translating a program from one language to another works best when the programmer has mastery of the language that he is translating into. It would be much harder for me to translate a SAS/IML program into MATLAB.
- Don't assume that functions that have the same name have the same behavior. I wasted a lot of time before I realized what the MATLAB SUM function was doing.
- It is useful to be able to step through the original program, or at least understand how it works. I don't have a copy of MATLAB, so I couldn't check my results against Dr. Cardillo's original MATLAB code.
- A dictionary of differences is useful. My version of the MATLAB to SAS/IML cheat sheet is enough to get someone started, but I wish it were better. Unfortunately, I don't have the MATLAB expertise to ensure that the dictionary is correct and complete. If anyone wants to extend the cheat sheet, please do so! Let me know in the comments whether someone has written a better cheat sheet.
Acknowledgements
Although this blog post is about the general process of translating MATLAB programs, I'd like to thank Dr. G. Cardillo for writing and posting the specific MATLAB program that I used. If anyone uses my SAS/IML translation, please reference his work appropriately:
Cardillo G. (2008) Cuzick's test: A Wilcoxon-Type Test for Trend.
http://www.mathworks.com/matlabcentral/fileexchange/22059
1 Comment
Just noticed that this post was referenced in a journal article. Hadley, et al (2014) Clin. Vaccine Immunol., vol 21, no. 2, p 203-211. URL: http://cvi.asm.org/content/21/2/203.full