Although she’s an analyst, Anca Tilea estimates that she spends 80% of her time cleaning data. Tilea and co-author Deanna Chyn shared seven of their favorite methods for checking, cleaning and restructuring data.
Attendees at MWSUG 2013 got a bonus tip: Ask SAS peers in one of the SAS Support Communities. Anca told the audience that the communities are a “huge resource” for her now. Topics range from basic to really complex, and the discussion is always good.
The description of each tip Data Cleaning 101: An Analyst’s Perspective suggests processing challenges where the technique is most useful. There are side-by-side code examples that show less effective, but workable coding methods alongside the following:
- CALL SYMPUT saves values into macro variables that can later be recalled
- PROC SQL SELECT INTO to avoid coding errors when listing variable names
- PROC TRANSPOSE for rearranging data before performing calculations on columns
- %MACRO %DO 1 %TO n or macro arrays to handle repetitive reads
- PROC SQL JOIN for working with date ranges
- three SAS character functions for subsetting an existing SAS dataset based on some specific rule
- IF-ELSE statements that avoid unnecessary processing time
You can find this paper and more at the MWSUG 2013 Proceedings.
1 Comment
Wonderful to see that Anca Tilea's name is getting out there. As is evident from her work, she is a brilliant analyst! Thanks for writing the article.