A reader wrote for help with a computational problem. He has a vector of length N and the vector contains integer values in the range [1, 120], which represent months for which events occurred over a 10-year period. The question is: what is the 24-month period for which the most events occurred?
To be concrete, here's some data with N=50:
data Events; keep y; call streaminit(1); do i = 1 to 50; y = ceil(120*rand("Uniform")); /* random in [1,120] */ output; end; run; |
I suspect that this is a classic SAS programming problem that can be solved efficiently in the DATA step. I can think of two approaches. The first is to use PROC FREQ to generate a frequency vector that contains exactly 120 elements, and the value of each element is the number of events that occurred during the month. A subsequent data step can be used to count the number of events in each rolling 24-month period. The second approach is to sort the data and use the implied AND operator to count how many events occurred for each 24-month period.
But what if the data are in a SAS/IML vector? In this case, I like the second approach. The following statements sort the data and count the number of events in each 24-month rolling window:
proc iml; use Events; read all var {y}; close; Range = 120; /* given range(y)=[1,120] */ call sort(y); /* or sort(y,1) prior to SAS/IML 9.22 */ Window=24; RunningTotal = j(Range-Window+1,1); /* allocate */ do i=1 to Range-Window+1; RunningTotal[i] = sum( y>=i & y<(i+Window) ); end; max = max(RunningTotal); print max; |
For the sample data, the maximum number of events in any 24-month variable is 16. You can use the LOC function to find the indices of RunningTotal for which the maximum was achieved, since there could be more than one occurrence of the maximum.
Can you think of a better approach in SAS/IML? Perhaps one that avoids the DO loop? Post your solution in the comments, including solutions that use the DATA step, PROC SQL, or other SAS routines.
Addendum 27JUN2012: There have been several solutions to this problem posted on the SAS-L discussion list.
1 Comment
Hi ... another approach (I should really learn more about IML !!!), the rolling 24-month window you mentioned in your comment (but no PROC FREQ) ...
data _null_;
array x(120) _temporary_;
array z(24) _temporary_;
do until(done);
set events end=done;
x(y) + 1;
end;
do j=1 to 120;
z(mod(j,24)+1) = x(j);
maxy = max(of maxy, sum(of z(*),0));
end;
put maxy=;
run;