3 gifts to magically transform your data

15

‘Tis the season to be gifting!  So what could be better than gifting your fellow SAS user or yourself with three gifts to celebrate the holiday season?

As 2012 comes to an end, we think it’s a great idea to be thinking of ringing in the New Year with dynamic techniques.

A student wanted to do away with hard coding. His data has 1000’s of numeric columns with missing values.  And he wants to convert all the missing values to a zero. He’s already read through my previous post on verifying that the data really should be a zero. He doesn’t care about the missing system option since all that does is display missing value as 0, underlying numbers still remain missing. So we have two problems we need to help him with:

1)      Convert actual data values from missing to zero.

2)      Do this dynamically without manual intervention (a.k.a. Let SAS do this). Student is tired of hardcoding the variable names one by one, and honestly I don’t think we can blame him, he can really put his time to better use!

Let SAS come to the rescue in a heroic manner using three languages: You did know that SAS was a linguist, didn’t you? Oh! You didn’t, then let’s see what SAS can do.

*First step: Grab the name, type and position information for all numeric variables from PROC CONTENTS & push that information to a SAS dataset.

proc contents data = miss
out = vars(keep = varnum name type where=(type=1))
noprint;
run;

What did we do? We dumped the Name with PROC CONTENTS to your name, type information to a SAS dataset.  Now how clever is that? Selecting only your numeric variables with the type= option. If you needed to gift character data you’d go with type=2.

I’m sure you’re just as choosy with your gift giving.

*Second step: Load the numeric variables we just collected into a macro variable using PROC SQL;

proc sql noprint;
select distinct name
into :allnumvars separated by ' '
from vars
quit;
%put &allnumvars;

What did we do? We took the variable called NAME and held it in a macro variable called allnumvars using the INTO: syntax. I really didn’t need the %put statement at the end except out of curiosity to see what the allnumvars macro variable is stocking for me.

Now confess, how many of you have peeked into your Christmas stocking before you were supposed to?

*Third Step: Create an array to hold the numeric variables and do array processing to convert all missing values to a 0;

data missing2;
set miss nobs=new;
array miss{*}  &allnumvars;
do i=_n_ to new;
do k=lbound(miss) to hbound(miss);
if miss{k} = .
then miss{k}=0;
end;
end;
run;

What did we do? We created an array to point to the variables being held in our allnumvars macro variable. Then passed the array through the do loop to convert missing values to zero just for those variables.

Voila, there you have it! Every single missing value in all your numeric variables has been converted to zeros.

So what were the three gifts? Well, the three languages that SAS moves easily across – data step, proc sql and the macro language. Oh, and an added bonus of the first PROC CONTENTS step.

Isn’t that a nice gift that you can use? And I would not mind at all if you wanted to re-gift this.

In fact, I would be very glad if others could also benefit from our re-gifting. I wish you and your family a very happy holiday season and would also like to give you another gift.

I hope you liked these gifts just as much as I had preparing it. You may have some gifts for me in the form of Oh–I-can-write-you-better code. And I’d love to hear from you.

Share.

About Author

Charu Shankar

Technical Training Specialist

Charu Shankar has been a Technical Training Specialist with SAS since 2007. She started as a programmer, and has taught computer languages, business and English Language skills. At SAS, Charu teaches the SAS language, SQL, SAS Enterprise guide and Business Intelligence. She interviews clients to recommend the right SAS training to help them meet their needs. She is helping build a center for special needs kids in this project. http://www.handicareintl.org/pankaja/pankaja.swf

15 Comments

  1. Thank for the gift! I did want to point out to slave off any confusion in the code snipits that the & that might show up in your browser display are html code for an ampersand. Took me a second to realize this until it hit me.

    • Charu Shankar

      thanks Eric. Between browser and sometimes quirky blog editing tool, the ampersand took a hit. you're right step 2 should really read as
      %put &allnumvars;
      happy holidays.

  2. In the case where you want to examine and update all numeric variables in the data set, you might consider using the _Numeric_ keyword in the array definition to include all numeric variables (currently in the program data vector) in the array.

    For example:

    data missing2;
    set miss ;
    array miss{*} _Numeric_;
    NoChange = .; /*Will be left Missing;*/

    do j=1 to dim(Miss);
    if Miss(j) = . then Miss(j) = 0;
    end;
    drop j;
    run;

    Happy Holidays!

    • You can refine that syntax still like so:

      data missing2;
      set miss;

      array x _numeric_;
      do over x;
      if x = . then x = 0;
      end;
      run;

    • Charu Shankar

      very very nice gift Matt, thoroughly appreciate it! I just wanted to show SAS skills at being multilingual with sql, macro & datastep. I do like your gift & am so glad you posted it so others can also share.. happy holidays!

  3. I think I completely followed your code except for "amp". Where was this macro variable created? At least, we know "allnumvars" was created in Proc SQL. But I am lost on "amp"

    • Charu Shankar

      thanks Divyesh for pointing that out.. I realized this typo problem only after all your comments.. only after the blog posted as I mentioned in my reply to Eric. so the last line in step 2 should really be
      %put &allnumvars;
      Have a great holiday season!

  4. Hi,
    I'm confused as to why you have 2 do-loops. What is the purpose of the :
    do i=_n_ to new;
    loop? I wouldn't think it is needed.

    Thanks,
    -Q.

    • Charu Shankar

      thanks Quentin, nice catch. that was a redundant do loop. .step 3 can easily read:
      data missing2;
      set miss;
      array miss{*} &allnumvars;
      do k=lbound(miss) to hbound(miss);
      if miss{k} = .
      then miss{k}=0;
      end;
      run;

  5. All this can be done with a proc stdize as well. Just 3 lines of code:

    proc stdize data=miss reponly MISSING=0 out=new_miss;
    var _numeric_;
    run;

    • Charu Shankar

      thanks Christophe & Michelle,
      proc stdize is another great option to get the missing values converted.. I wanted to showcase the ability of proc sql to interface with the macro language in step 2--that allows users to store variables so easily & use & reuse them over and over again..

  6. Michelle Homes

    Whilst I appreciate the 3 gifts you offered my preference would be to do this in 1 procedure, proc stdize... All the code above could be replaced with the following:

    proc stdize data=miss reponly missing=0 out=missing2;
    var _numeric_;
    run;

Leave A Reply

Back to Top