How good is an AI chatbot at SAS programming?

A lot of programmers have been impressed by the ability of ChatGPT, GPT-4, and Bing Chat to write computer programs. Recently, I wrote an article that discusses an elementary programming assignment, called FizzBuzz, which is sometimes used as part of a hiring process to assess a candidate's basic knowledge of a programming language. I was curious whether an AI chatbot could write the simple FizzBuzz program in SAS, and how it would respond to some of the more sophisticated variations. Because the chatbots use information that is available on the internet to formulate their answers, I wanted to test the chatbots before I published the article, which provides the correct answers to several FizzBuzz-related programming exercises.

I decided to use Bing Chat, which uses the GPT-4 Large Language Model (LLM) and is integrated into the Bing search engine. I asked my questions on 05MAY2023, which is prior to the publication of my article (on 08MAY2023) that provides the correct solutions. I assume that chatbots will use my article to improve their answers in the future.

Can an AI chatbot write a basic SAS program?

Let's jump to the main conclusion: currently, an AI chatbot is not very proficient at writing a basic SAS program that implements the FizzBuzz algorithm. In many cases, the chatbot responded with invalid syntax that did not even run. In several cases, I was able to modify the response to create a correct program, but some of the modifications required programming knowledge that a junior-level programmer might not possess. In other cases, I did not understand what the chatbot was trying to accomplish, and I was unable to modify the program to make it work.

In this article, I show the results from Bing Chat. Some of the results are not correct. See my previous article about FizzBuzz to obtain the correct SAS programs.

What is the FizzBuzz algorithm

To begin the conversation, I asked Bing Chat, "Explain the FizzBuzz algorithm." Its response was perfect:

The FizzBuzz algorithm is a simple program that is often used to teach the basics of 
coding and as an interview question for programming jobs. The algorithm works by 
iterating through a range of numbers (usually 1 to 100) and for each number:
 
- If the number is divisible by both 3 and 5 (or 15), the program outputs "FizzBuzz".
- If the number is only divisible by 3, the program outputs "Fizz".
- If the number is only divisible by 5, the program outputs "Buzz".
- If the number is not divisible by either 3 or 5, the program outputs the number itself.
 
The result is a sequence of numbers and words where multiples of 3 are replaced with "Fizz", 
multiples of 5 are replaced with "Buzz", and multiples of both are replaced with "FizzBuzz".

Write FizzBuzz in SAS

Now that I had informed Bing Chat that we are discussing the FizzBuzz algorithm, and it understood the topic, I moved to the first task. I prompted it to "write the FizzBuzz program in SAS." It wrote the following SAS macro program, which I consider to be an abomination:

/* Bing Chat response on 05MAY2023 */
%macro fizzbuzz(n);
    %do i=1 %to &n;
        %if %sysevalf(&i/15) = int(&i/15) %then %put FizzBuzz;
        %else %if %sysevalf(&i/3) = int(&i/3) %then %put Fizz;
        %else %if %sysevalf(&i/5) = int(&i/5) %then %put Buzz;
        %else %put &i;
    %end;
%mend fizzbuzz;
%fizzbuzz(100);

It seems to have confused the SAS language with SAS macro. This is a mistake that I see many novice programmers make. A novice programmer sometimes reaches for the macro language when there is no need to do so.

This program does not run. It produces an error in the log. An experienced macro programmer will be able to modify the chatbot's program to make it work. In this case, replace the INT calls with a call to %EVAL. The modified program produces the correct FizzBuzz sequence.

Write FizzBuzz in the DATA step

I didn't like the chatbot's first answer, so I tried again: "Use the DATA step to write the FizzBuzz program in SAS." This time it gave an answer that was similar to (but not identical to!) the DATA _NULL_ program on the Rosetta Code site. Bing Chat gave the following program, which works correctly:

/* Bing Chat response on 05MAY2023 */
data _null_;
    do i=1 to 100;
        if mod(i,15)=0 then put 'FizzBuzz';
        else if mod(i,3)=0 then put 'Fizz';
        else if mod(i,5)=0 then put 'Buzz';
        else put i;
    end;
run;

Although the structure of the implementation is very similar to the Rosetta Code entry, the text in the Bing Chat program is put in single quotes (rather than double quotes), and the "else if" statements are in a different order. Interestingly, Bing Chat does not include the Rosetta Code site as a reference.

Write FizzBuzz by using PROC FCMP

Next, I posed the question, "Write the FizzBuzz program in SAS by using a user-defined function that is implemented by using PROC FCMP." It provided the following response, which is almost, but not quite, correct.

/* Bing Chat response on 03MAY2023 */
/* Define the function */
proc fcmp outlib=work.funcs.fizzbuzz;
    function fizzbuzz(num);
        if mod(num, 15) = 0 then return 'FizzBuzz';
        else if mod(num, 3) = 0 then return 'Fizz';
        else if mod(num, 5) = 0 then return 'Buzz';
        else return num;
    endsub;
run;
 
/* Call the function */
data _null_;
    do i = 1 to 100;
        result = fizzbuzz(i);
        put result=;
    end;
run;

The program it gave will not run. There are several problems:

The user-defined function must end with a dollar sign ($) to indicate that it returns a character value.
The FizzBuzz function needs to return a variable value, not a literal. You can assign the return value to a local variable, or use parentheses to create a temporary variable.
The return type of the FizzBuzz function is ambiguous: sometimes it returns a character value, other times it returns a number (num),
The chatbot failed to include the statement
options cmplib=work.funcs;
which is needed to tell the DATA step how to find the FizzBuzz function.

Again, an expert can modify the program to get it to work, but a novice would be confused about how to proceed.

Write FizzBuzz by using PROC FORMAT

Since Bing Chat did not successfully complete the previous task, I had low expectations for the next assignment, which is to create a user-defined format by using PROC FORMAT and the previous function in PROC FCMP. The prompt I gave was, "Use PROC FORMAT in SAS to create a custom format that will format an integer according to the FizzBuzz algorithm." The output was the following garbled code:

/* Bing Chat response on 03MAY2023 */
proc format;
    value fizzbuzz
        low-high = [fizzbuzz.]
        other = [num.];
    if mod(_I_,15)=0 then put @ _I_ fizzbuzz.;
    else if mod(_I_,3)=0 then put @ _I_ fizz.;
    else if mod(_I_,5)=0 then put @ _I_ buzz.;
    else put @ _I_ _I_;
run;
 
data _null_;
    do i=1 to 100;
        put i fizzbuzz.;
    end;
run;

I honestly don't know what Bing Chat was trying to do here by calling the PUT statement from inside PROC FORMAT. The definition of the FIZZBUZZ. format needs to call the FizzBuzz function. Notice that this program uses the FIZZBUZZ. format as part of the definition of the FIZZBUZZ. format, which is an undefined recursive operation. To its credit, the DATA _NULL_ step would be correct if the FIZZBUZZ. format were properly defined.

I tried various other prompts to try to get a better solution. Another "solution" it gave is the following incorrect syntax:

/* Bing Chat response on 03MAY2023 */
proc format;
    value fizzbuzz
        low - high = [fizzbuzz.]
        ;
    picture fizzbuzz
        other = '%0d' (prefix=' ')
        3 = 'Fizz' (mult=3)
        5 = 'Buzz' (mult=5)
        15 = 'FizzBuzz' (mult=15)
        ;
run;

Advantages and disadvantages of the methods

The chatbot is not proficient at writing SAS programs, but let's see whether it can write prose. I asked Bing Chat, "What are the advantages and disadvantages of the three previous methods that implement the FizzBuzz algorithm in SAS: a DATA step, PROC FCMP, and PROC FORMAT?" It gave me a nicely formatted set of paragraphs:

Each method for implementing the FizzBuzz algorithm in SAS has its own advantages and disadvantages.

Method 1: Using a DATA Step
Advantages: This method is straightforward and easy to understand. It’s also flexible and can be easily modified to handle more complex logic.
Disadvantages: The code can become lengthy and repetitive if there are many conditions to check.

Method 2: Using PROC FCMP
Advantages: This method allows you to define a reusable function that can be called from multiple places in your code. It also makes it easier to maintain and update your code since changes only need to be made in one place.
Disadvantages: This method requires knowledge of PROC FCMP and may be more difficult to understand for those unfamiliar with it.

Method 3: Using PROC FORMAT
Advantages: This method allows you to define a custom format that can be used to format data in a consistent manner throughout your code. It also makes it easy to apply the FizzBuzz logic to large datasets.
Disadvantages: This method requires knowledge of PROC FORMAT and may be more difficult to understand for those unfamiliar with it. It also may not be as flexible as the other methods for handling more complex logic.

In conclusion, each method has its own strengths and weaknesses, and the best method to use will depend on the specific needs of your program.

The paragraphs are correct, if vague. They aren't wrong, but they aren't particularly insightful or creative, either. The statement that defining a function "makes it easier to maintain and update your code since changes only need to be made in one place" is standard boilerplate jargon about the advantages of encapsulation. The statements about more sophisticated methods being "more difficult to understand for those unfamiliar with it," is also a familiar trope. They remind me of a test answer written by a student who has a passing familiarity with a topic, and who has learned how to maximize the points he will get on an exam.

A footnote: Implementing FizzBuzz in parallel on SAS Viya

Bing Chat includes footnotes and hyperlinks for its references. Among the footnotes was a link to a 2020 post on the Statmike blog that discusses how to implement the FizzBuzz algorithm in parallel in a SAS Viya environment. I was unfamiliar with that article, so Bing Chat taught me something new about implementing the FizzBuzz algorithm in SAS!

Summary and reflections

In summary, Bing Chat is not very skilled at SAS programming. In many cases, it produced programs that were close to being correct but had a subtle mistake that prevented the program from running. Other times, it produced programs that I could not understand.

One of my frustrations during this experiment is that the chatbot responds authoritatively. If you ask a human a question, you might get an answer that begins with, "Well, I'm not sure, but my best guess is ...." This enables you to assign a degree of confidence to their response. In contrast, a chatbot responds, "The answer is ...," and gives you a response. This confidence might be unwarranted because the answer might not be entirely correct. This is fine when you are an expert in the area and can identify a subtle mistake, but it is problematic when you are a novice and do not have the knowledge to determine whether the response is correct. It would be nice if the chatbot indicated the confidence it has in its response.

Of course, this is not a new problem. The internet contains information and misinformation. Anyone who has ever performed an internet search knows that some results are good and others are bad. It takes critical thinking and prior knowledge to know which is which. When we use the internet, we use our existing knowledge to emphasize (overweight) certain sites and deemphasize (underweight) others. A chatbot does something similar, but we do not know the weights that it used. Accordingly, we must read a chatbot's response with the same skepticism and critical thinking that we use whenever we research a topic on the internet.

6 Comments

Jim LOUGHLIN on May 10, 2023 2:26 pm

Rick,

I remember in the 90's CASE (Computer Aided Software Engineering) was going to replace most programmers. Also, something called 'Expert Systems' was going to replace doctors, lawyers. etc. Well, it's 2023 and programmers, doctors, and lawyers are still here. I guess AI is the new case/expert system. I bet programmers, doctors, lawyers etc will still be here 30 years from now.

Everyone likes the idea of a fully automated pilotless plane but no one wants to fly on one.

- Rick Wicklin on May 10, 2023 2:47 pm
  
  Thanks for writing. I hear what you are saying. Sometimes hype overpromises the future. I remember predictions that software will one day automatically analyze all data sets and reveal all relationship among the observations and between the variables. But I also remember the hype of personal computers, the internet, e-commerce, e-banking, mobile phones, and other advances that have changed our world.
  
Andrew Ziem on May 16, 2023 11:30 am

I had similar experiences asking ChatGPT (3.5) to generate simple snippets of SAS code, like to log the current timestamp to a file every 5 minutes. It gave an authoritative answer. At first glance, it looked plausible, but it wouldn't run, even with some tweaking.

As SAS is relatively less popular than Python and JavaScript, it's a lack of the model's experience with SAS, but as the models become larger and more complex with more training data and more model parameters, they will master SAS. Even now, GPT 4 may do much better with SAS, but I haven't paid for access to it.

Alex Cuza on May 30, 2023 7:53 am

Hi,
Not sure if it is due to Rick's articles about or not but I received two good answers from Bing Chat to implement FizzBuzz. I suppose it knows now the discussion :-D

Cheers.

Here my question on setting MorePrecise. Q: [Act as an expert SAS programmer] write the most efficient SAS program to solve the FizzBuzz quiz.

First answer:
data _null_;
do i = 1 to 100;
if mod(i, 15) = 0 then put 'FizzBuzz';
else if mod(i, 3) = 0 then put 'Fizz';
else if mod(i, 5) = 0 then put 'Buzz';
else put i;
end;
run;

My additional request: I would like the output to be in the output window not in log. Furthermore, please make it reusable, so I can give another upper limit.
Answer:
%macro FizzBuzz(n);
data FizzBuzz;
do i = 1 to &n;
if mod(i, 15) = 0 then result = 'FizzBuzz';
else if mod(i, 3) = 0 then result = 'Fizz';
else if mod(i, 5) = 0 then result = 'Buzz';
else result = put(i, 8.);
output;
end;
run;
proc print data=FizzBuzz noobs;
var result;
run;
%mend;

%FizzBuzz(100);

- Rick Wicklin on May 30, 2023 8:12 am
  
  Thanks for the update.
  
  - Alex Cuza on May 30, 2023 9:18 am
    
    Thanks for your articles, Rick.
    The macro given by Bing still lacks the length statement for var result.

Blogs