As you know, almost every SAS programming problem has many very different solutions. I’m going to solve a very simple problem using two different approaches.
The problem: Compute the sum of integers from 1 to 1,000,000.
I bet most of you thought of a solution almost immediately. Let me guess that you thought of one of the solutions shown below:
Solution 1
options fullstimer; ❶ data _null_; ❷ do i = 1 to 1000000; Sum + i; ❸ end; file print; ❹ put Sum=; run; |
The option fullstimer will give you more complete timing information.
❷ To be more efficient use data _null_.
❸ The SUM statement does several things: 1) the variable Sum is retained (not set back to missing) for each iteration of the DATA Step. 2) Sum it initialized at zero. 3) If you had an expression instead of the constant (1) missing values would be ignored.
❹ Use FILE PRINT to send the output to the output window instead of the default location, the LOG.
Solution 2
data Integers; ❶ do i = 1 to 10000000; output; ❷ end; run; title "Sum of Integers"; proc means data=Integers n sum; ❸ var i; run; |
Create a data set called Integers.
❷ Output an observation for each iteration of the DATA Step. Note that the OUTPUT statement is inside the DO Loop.
❸ Use PROC MEANS to compute the sum.
Although both programs work, there is a difference in CPU time. Does that mean you should always seek a DATA Step solution? Not really. It depends on several factors, such as how often the program is to be run and which method you feel most comfortable with.
Here is a partial listing of the SAS Log showing timing information:
NOTE: 1 lines were written to file PRINT. NOTE: DATA statement used (Total process time): real time 1.00 seconds user cpu time 0.21 seconds system cpu time 0.32 seconds memory 7875.03k OS Memory 16876.00k Timestamp 02/16/2023 08:23:18 AM Step Count 1 Switch Count 0 NOTE: The data set WORK.INTEGERS has 10000000 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.21 seconds user cpu time 0.20 seconds system cpu time 0.00 seconds memory 410.03k OS Memory 17392.00k Timestamp 02/16/2023 08:23:18 AM Step Count 2 Switch Count 0 NOTE: There were 10000000 observations read from the data set WORK.INTEGERS. NOTE: PROCEDURE MEANS used (Total process time): real time 0.25 seconds user cpu time 1.07 seconds system cpu time 0.03 seconds memory 8471.84k OS Memory 25116.00k Timestamp 02/16/2023 08:23:19 AM Step Count 3 Switch Count 0
Do you care about CPU time? Unless this is a production program, I think you should program in a way that is most comfortable (unless you are a compulsive programmer and want the “best” program). By the way, if you remove the FILE PRINT statement from solution 1, the System CPU time is 0.0. I guess there is some overhead to sending the results to your output device.
I’m interested in what your first instinct was when you read the problem. One of these two, or something else. Please post your comments below.
LEARN MORE | Ron Cody's books on Amazon
3 Comments
Sum of an arithmetic sequence right there.
Wow, I had to Google that one. Never came across that formula before. However, I hope you see where I was going in this blog. Best, Ron
Hello Ron, I would say suggest:
data _null_;
sum = 500000*1000001;
put sum=;
run;
But maybe this one is out of competition 🙂
Eric