Point/Counterpoint: Should a programming language accept misspelled keywords?

Longtime SAS programmers know that the SAS DATA step and SAS procedures are very tolerant of typographical errors. You can misspell most keywords and SAS will "guess" what you mean. For example, if you mistype "PROC" as "PRC," SAS will run the program but write a warning to the log: WARNING 14-169: Assuming the symbol PROC was misspelled as PRC.

This feature provided a big productivity boost in the days before GUI program editors. Imagine submitting a program from a command line in the early 1980s. If you mistyped one keyword you would have to retype the entire statement. As a convenience, SAS implemented an algorithm that checks the "spelling distance" between the tokens that you submit and a list of valid keywords for the procedure that you are calling. DATA step programmers might be familiar with the SPEDIS function, which measures how close two words are to each other in the English language. The SAS language parser uses the same algorithm.

Not everyone wants this feature. Many companies in regulated industries (such as pharmaceuticals) turn off the autocorrect feature in SAS because they want to force their programmers to type every keyword correctly. You can determine whether AUTOCORRECT option is enabled on your system by running PROC OPTIONS:

proc options option=AUTOCORRECT value;  run;

The AUTOCORRECT option is turned on by default. You can turn off the option by submitting options NOAUTOCORRECT or by putting -NOAUTOCORRECT in a configuration file.

Today I've invited two people to argue for and against using this feature. Larry Literal is a programmer who believes that no program should ever accept a syntax error. Annie Intel sees nothing wrong with programs that self-correct. She argues that it is desirable for programs to interpret the intention of the programmer. Which do you agree with? Do you have something to add? Leave a comment.

Point: A program should not allow ambiguity

My name is Larry Literal and I believe that computer programming should be an exact science. There is no room for ambiguity. A program that runs because it is "close to" a correct program is an abomination. I do not want a computer to change the code that I write!

When my system administrator installs a new version of SAS, the first thing I do is turn off the autocorrect feature. (I've also turned off the autocorrect feature on my phone. What a pain!) My main argument against the AUTOCORRECT option is that it makes code unreadable. Take a look at the following program:

/* The correct program is:
   proc freq data=sashelp.class order=freq;
      table sex / chisq;
   run;
*/
prc freq dta=sashelp.class ordor=freqq;
   tble sex / chsq;
runn;

Every keyword in this program is mistyped. The only tokens that are specified correctly are the name of the procedure, the name of the data set, and the name of the variable. The program looks more like the Klingon language than the SAS language, yet this program runs if you use the AUTOCORRECT option!

And what happens if SAS introduces a new keyword that is closer to a mistyped word than a previous keyword? Then the procedure might do something different even though I have not changed the program! The autocorrect feature is an abomination and should never be used!

Counterpoint: Computers should interpret what you say

Really, Larry? "An abomination"? What century are you living in?

My name is Annie Intel, but my friends call me "A.I." I think the SAS autocorrect feature was way ahead of its time. Today we have autocorrecting logic on smartphones and word processors. Applying the same techniques to computer programs is no different. In fact, if you use a modern SAS program editor, the editor will suggest valid keywords and flag any keyword that is not valid.

Let's be real: Larry's example is not realistic. No programmer is going to use that garbled call to PROC FREQ in a production job. The autocorrect feature does not "make code unreadable." It is a convenience while developing a program, not an excuse to write nonsense. Any competent programmer will check the log for warning messages and correct the typos.

Larry claims that he doesn't want a computer munging and altering the code he writes. But optimizing compilers have been doing exactly that for decades! Programmers write instructions in a high-level language and an optimizing compiler maps the code to a set of machine instructions. The compiler will sometimes rearrange the structure of the program to get better performance. If it is okay for a compiler to map a program into an optimal version of itself, why is it not okay for a parser to do the same by correcting misspellings?

I want computers to recognize my intentions. When I give a voice command to my smartphone or personal home device, the audio signal is mapped to an action. I am allowed a certain amount of flexibility. "Turn on the lights" and "turn da light on" are equivalent phrases that should be understood and mapped to the same action. The SAS AUTOCORRECT feature is similar. The interpreter has a context (the name of the procedure) which is used to standardize your input. I think it is very cool. In the future, I think more programming languages will accept ambiguities.

7 Comments

Anders Sköllermo on February 27, 2017 12:23 pm

Good discussion! I can agree with both points! When fisrt developing programs, You want to test them as quickly as possible, so AUTOCORRECT may be good.
Especially if You can get the correct code somewhere.

In production jobs I want REALLY STRICT PROGRAMMING.
(Not only NOAUTOCORRECT, everything!)
A production job should be extremely well written, so You do not have to care about it.

Paul Kaefer on February 27, 2017 12:25 pm

As someone who has studied computer science - both the theory and the application - I initially see red flags in autocorrection. This is partly because as a programmer, I want to know that what I see is what will be compiled or interpreted. I don't want my code changed without knowing exactly what changed. And if I accidentally name a variable using a language keyword, how will autocorrect treat that? What if there are two keywords or options that could be misspelled in the same way? I don't want to have code with any chance of misinterpretation. My code should run as I wrote it, whether it is incorrect or not. It's my job as a good programmer to understand how to write good code.

There was some very interesting discussion on the SAS Communities Forums in regard to the idea of SAS automatically imputing missing semicolons. The result was that this is problematic, as SAS may make some assumptions that go against the actual logic of the program.

I think what matters is for good programmers to exercise caution. While there may be benefits that seem convenient, they come at a potential cost - one that may be very difficult to detect and measure.

There is some interesting further discussion here.

- Rick Wicklin on February 27, 2017 1:58 pm
  
  Thanks for the thoughtful reply. For the sake of others, I want to clarify two topics in Paul's comment:
  1. SAS allows you to use a variable name that is also a keyword. SAS is smart enough to know when a token is a variable and when it is a keyword. Thus you can use IF, THEN, DO, END, PROC, RUN, etc. as variable names, if you want.
  2. SAS does not "impute semicolons." The discussion that Paul refers to was a "ballot suggestion" in which a SAS user requested that feature. However, the discussants quickly concluded that such as feature is neither practical nor feasible.
  
  - Chris Hemedinger on March 1, 2017 8:22 am
    The lack of "reserved" words in SAS allows for crazy programs like this:
    
    data _null_; if = 1; and = 0; if if eq and and and eq if then put 'they are the same'; else put 'they are different'; run;
    
    - Rick Wicklin on March 1, 2017 8:30 am
      
      Yes, more examples like this are contained in light-hearted tongue-in-cheek presentations and papers by Art Carpenter and Tony Payne about "how to achieve job security" by writing undecipherable SAS programs! Of course, the paper is really a description of what a SAS programmer should AVOID doing.
      
    - Eric on March 2, 2017 3:05 am
      
      Hello Chris,
      
      This is great code. It is also possible to add an even number of ands, like:
      
      data _null_;
      if = 1; and = 0;
      if if eq and and and and and eq if then
      put 'they are the same';
      else
      put 'they are different';
      run;
      
      Cheers,
      Eric ("SAS newbie") who did the suggestion (hiding under a blanket :-))
      
Peter Lancashire on February 28, 2017 2:36 am

For me a SAS script must be unambiguous now and in future versions. If autocorrect always makes the same corrections and these can be defined and are stable over time (future versions) then I have no problem with it. In that case, of course, it can be added to the grammar of the SAS language. Then it is not "autocorrect" but a larger defined dictionary of language keyword tokens. Does autocorrect meet these requirements?
.
By the way, I would really like to see the formal grammar of SAS as one has for other languages. A good way to represent it is railroad diagrams; they are easier to read than BNF.

Blogs

Blogs

Point/Counterpoint: Should a programming language accept misspelled keywords?

Point: A program should not allow ambiguity

Counterpoint: Computers should interpret what you say

About Author

7 Comments

Leave A Reply Cancel Reply