"Code golf" is a fun programming pastime that challenges you to solve a problem with the least amount of code possible. Like regular golf, the goal is to use fewest code "strokes" to hit the mark. Here's a recent challenge that was posted to me via Twitter.
@cjdinger @SASJedi got a fun puzzle for you guys, we've been discussing at my office.
You have a character var with the string "000112010302". What's the least amount of code that can be written to determine what is the highest number (3) in the string?— Wes 🇨🇦 (@SigurWes) July 17, 2018
While I feel that I can solve nearly any problem (that I can understand) using SAS, my knowledge of the SAS language is quite limited when compared to that of many experts. And so, I reached out to the SAS Support Communities for help on this one.
The answers were quick, creative, and diverse. I'll share a few of them here.
The winner, in terms of concision, came from FreelanceReinhard. He supplied a macro-function one-liner:
%sysfunc(findc(123456789,000112010302,b));
With this entry, FreelanceReinhard defied a natural algorithmic instinct to treat this as a numerical digit comparison problem, and instead approached it as simple pattern matching problem. The highest digit comes from a finite set (0..9). The FINDC function can tell you which of those digits is the first to be found in the target string. The b directive tells FINDC to work backwards through the pattern, from '9' down to '0'.
In a similar vein, novinosrin's approach uses the COMPRESS function to keep only the highest digits from the pattern, in descending order, and then applies the FIRST function to return the top value.
a=first(compress('9876543210','000112010302','k'));
The COMPRESS function is often used to eliminate matching characters from a string, but the k directive inverts the action to keep only the matching characters instead.
If you wanted to use the more traditional approach of looping through values, comparing, and keeping just the maximum value, then you can hardly do better than the code offered by hashman.
do j = 1 to length (str) ; d = d <> input (char (str, j), 1.) ; end ;
Experienced SAS programmers will remember that the <> operator is shorthand for MAX (as opposed to "not equal" as some of us learned in Pascal or SQL). "MAX" might be clearer to read, but it requires an additional character. (Remember the "><" is shorthand for the MIN operator in SAS.)
AhmedAl_Attar offered the most dangerous approach, using memory manipulation techniques to populate members of an array:
array ct [20] $1 _temporary_; call pokelong (str,addrlong(ct[1]),length(str)); c=max(of ct{*});
CALL POKELONG and ADDRLONG are documented along with several cautions due to the risk of overwriting something important in your process or system memory. But, they are fast-acting.
And finally, I knew that there would be an elegant matrix-based approach in SAS/IML. ChanceTGardener offered the first variant, and then Rick Wicklin echoed it shortly after.
proc iml; str='000112010302'; maximum=max((substr(str,1:length(str),1))); print maximum; quit;
Code golf does not always produce the most readable, maintainable code. But puzzles like these encourage us to explore new features and nuanced behaviors of our favorite programming language, and thus broaden our understanding of how SAS really works.
Appendix: Code for featured solutions
Want to experiment with these different approaches? Here's a SAS program that combines all of them. Think you can do better (or different)? Visit the communities topic and chime in.
data max; str = '000112010302'; /* novinosrin's approach */ a=first(compress('9876543210',str,'k')); /* FreelanceReinhard's approach */ b=findc('123456789',str,-9); /* AhmedAl_Attar's approach using POKELONG */ array ct [20] $1 _temporary_; call pokelong (str,addrlong(ct[1]),length(str)); c=max(of ct{*}); /* loop approach from hashman */ /* remember that <> is MAX */ do j = 1 to length (str) ; d = d <> input (char (str, j), 1.) ; end ; drop j; run; /* FreelanceReinhard's approach in a one-liner macro function */ %let str=000112010302; %put max=%sysfunc(findc(123456789,&str.,b)); /* IML approach from ChanceTGardener */ /* Requires SAS/IML to run */ proc iml; str='000112010302'; maximum=max((substr(str,1:length(str),1))); print maximum; quit; |
5 Comments
Hi Chris,
To justify the use of the CALL POKELONG and ADDRLONG functions in my code, these functions allow for comparing 2/3/4 digits,...etc. by simply changing the array element (array ct [20] $1 _temporary_;) size from $1 to $2/$3/$4,...and so forth.
While I understand the task was to find the largest digit using the least code statements, if I'm not mistaken my approach is the only approach that can achieve multiple digit comparison with a single code change ;-)
Just my 2 cents,
Ahmed
Thanks for the comment, Ahmed! I hope that you didn't take my labeling of your code as "dangerous" as a slight -- I think it's valuable to have awareness of powerful functions like these, even if they require some care during use. And yes -- your approach is more flexible for multidigit comparisons.
Nice blog post, Chris!
Just for the record: The semicolon after the isolated %SYSFUNC call (the one-liner I had suggested) is useless. I certainly would have deleted it, had I known that this code would be further disseminated. The one-liner doesn't exactly address the original question anyway because it uses a character literal instead of a character variable. And of course it generates an error message when submitted in isolation (with or without the semicolon), but the correct answer (3) is written to the log as well.
Best regards,
Reinhard
In my evaluation (not that I'm the judge), "code golf" is about minimizing the number of moves, not the number of characters. So I don't mind the semicolon which, I think we agree, is a good practice even if not always necessary. And I encourage readers to go check out the original topic in addition to this paraphrased summary of all of the creative solutions.
Well done Reinhard. Well deserved win!