3 gifts to magically transform your data

‘Tis the season to be gifting!  So what could be better than gifting your fellow SAS user or yourself with three gifts to celebrate the holiday season?

As 2012 comes to an end, we think it’s a great idea to be thinking of ringing in the New Year with dynamic techniques.

A student wanted to do away with hard coding. His data has 1000’s of numeric columns with missing values.  And he wants to convert all the missing values to a zero. He’s already read through my previous post on verifying that the data really should be a zero. He doesn’t care about the missing system option since all that does is display missing value as 0, underlying numbers still remain missing. So we have two problems we need to help him with:

1)      Convert actual data values from missing to zero.

2)      Do this dynamically without manual intervention (a.k.a. Let SAS do this). Student is tired of hardcoding the variable names one by one, and honestly I don’t think we can blame him, he can really put his time to better use!

Let SAS come to the rescue in a heroic manner using three languages: You did know that SAS was a linguist, didn’t you? Oh! You didn’t, then let’s see what SAS can do.

*First step: Grab the name, type and position information for all numeric variables from PROC CONTENTS & push that information to a SAS dataset.

proc contents data = miss
out = vars(keep = varnum name type where=(type=1))
noprint;
run;

What did we do? We dumped the Name with PROC CONTENTS to your name, type information to a SAS dataset.  Now how clever is that? Selecting only your numeric variables with the type= option. If you needed to gift character data you’d go with type=2.

I’m sure you’re just as choosy with your gift giving.

*Second step: Load the numeric variables we just collected into a macro variable using PROC SQL;

proc sql noprint;
select distinct name
into :allnumvars separated by ' '
from vars
quit;
%put &allnumvars;

What did we do? We took the variable called NAME and held it in a macro variable called allnumvars using the INTO: syntax. I really didn’t need the %put statement at the end except out of curiosity to see what the allnumvars macro variable is stocking for me.

Now confess, how many of you have peeked into your Christmas stocking before you were supposed to?

*Third Step: Create an array to hold the numeric variables and do array processing to convert all missing values to a 0;

data missing2;
set miss nobs=new;
array miss{*}  &allnumvars;
do i=_n_ to new;
do k=lbound(miss) to hbound(miss);
if miss{k} = .
then miss{k}=0;
end;
end;
run;

What did we do? We created an array to point to the variables being held in our allnumvars macro variable. Then passed the array through the do loop to convert missing values to zero just for those variables.

Voila, there you have it! Every single missing value in all your numeric variables has been converted to zeros.

So what were the three gifts? Well, the three languages that SAS moves easily across – data step, proc sql and the macro language. Oh, and an added bonus of the first PROC CONTENTS step.

Isn’t that a nice gift that you can use? And I would not mind at all if you wanted to re-gift this.

In fact, I would be very glad if others could also benefit from our re-gifting. I wish you and your family a very happy holiday season and would also like to give you another gift.

I hope you liked these gifts just as much as I had preparing it. You may have some gifts for me in the form of Oh–I-can-write-you-better code. And I’d love to hear from you.

tags: sas code, sas programming

15 Comments

  1. Eric Jackson
    Posted December 20, 2012 at 10:34 am | Permalink

    Thank for the gift! I did want to point out to slave off any confusion in the code snipits that the & that might show up in your browser display are html code for an ampersand. Took me a second to realize this until it hit me.

    • Eric Jackson
      Posted December 20, 2012 at 10:36 am | Permalink

      And in my post it converted it to an actually ampersand ... "& amp ;" no spaces is what I was referring too.

    • Posted December 21, 2012 at 4:16 pm | Permalink

      thanks Eric. Between browser and sometimes quirky blog editing tool, the ampersand took a hit. you're right step 2 should really read as
      %put &allnumvars;
      happy holidays.

  2. Posted December 20, 2012 at 12:27 pm | Permalink

    In the case where you want to examine and update all numeric variables in the data set, you might consider using the _Numeric_ keyword in the array definition to include all numeric variables (currently in the program data vector) in the array.

    For example:

    data missing2;
    set miss ;
    array miss{*} _Numeric_;
    NoChange = .; /*Will be left Missing;*/

    do j=1 to dim(Miss);
    if Miss(j) = . then Miss(j) = 0;
    end;
    drop j;
    run;

    Happy Holidays!

    • Dominic
      Posted December 20, 2012 at 6:00 pm | Permalink

      You can refine that syntax still like so:

      data missing2;
      set miss;

      array x _numeric_;
      do over x;
      if x = . then x = 0;
      end;
      run;

    • Posted December 21, 2012 at 4:20 pm | Permalink

      very very nice gift Matt, thoroughly appreciate it! I just wanted to show SAS skills at being multilingual with sql, macro & datastep. I do like your gift & am so glad you posted it so others can also share.. happy holidays!

  3. Divyesh Dave
    Posted December 20, 2012 at 12:38 pm | Permalink

    I think I completely followed your code except for "amp". Where was this macro variable created? At least, we know "allnumvars" was created in Proc SQL. But I am lost on "amp"

    • Posted December 21, 2012 at 4:22 pm | Permalink

      thanks Divyesh for pointing that out.. I realized this typo problem only after all your comments.. only after the blog posted as I mentioned in my reply to Eric. so the last line in step 2 should really be
      %put &allnumvars;
      Have a great holiday season!

  4. Quentin
    Posted December 20, 2012 at 9:19 pm | Permalink

    Hi,
    I'm confused as to why you have 2 do-loops. What is the purpose of the :
    do i=_n_ to new;
    loop? I wouldn't think it is needed.

    Thanks,
    -Q.

    • Charu Shankar Charu Shankar
      Posted January 2, 2013 at 11:01 am | Permalink

      thanks Quentin, nice catch. that was a redundant do loop. .step 3 can easily read:
      data missing2;
      set miss;
      array miss{*} &allnumvars;
      do k=lbound(miss) to hbound(miss);
      if miss{k} = .
      then miss{k}=0;
      end;
      run;

  5. Christophe Z.
    Posted December 21, 2012 at 1:47 pm | Permalink

    All this can be done with a proc stdize as well. Just 3 lines of code:

    proc stdize data=miss reponly MISSING=0 out=new_miss;
    var _numeric_;
    run;

    • Charu Shankar Charu Shankar
      Posted January 2, 2013 at 11:04 am | Permalink

      thanks Christophe & Michelle,
      proc stdize is another great option to get the missing values converted.. I wanted to showcase the ability of proc sql to interface with the macro language in step 2--that allows users to store variables so easily & use & reuse them over and over again..

  6. Posted December 21, 2012 at 3:10 pm | Permalink

    Whilst I appreciate the 3 gifts you offered my preference would be to do this in 1 procedure, proc stdize... All the code above could be replaced with the following:

    proc stdize data=miss reponly missing=0 out=missing2;
    var _numeric_;
    run;

  7. Posted December 22, 2012 at 11:48 am | Permalink

    Divyesh, try removing that $amp and run the code.

    U will get the ans !!

  8. Paulus Mau
    Posted March 1, 2014 at 4:27 am | Permalink

    Thanks for the belated Christmas Present :) Great workshop! Thanks for coming!

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>