7 tips for creating cleaner data sets

1

Although she’s an analyst, Anca Tilea estimates that she spends 80% of her time cleaning data. Tilea and co-author Deanna Chyn shared seven of their favorite methods for checking, cleaning and restructuring data.

Attendees at MWSUG 2013 got a bonus tip:  Ask SAS peers in one of the SAS Support Communities. Anca told the audience that the communities are a “huge resource” for her now. Topics range from basic to really complex, and the discussion is always good. 

The  description of each tip Data Cleaning 101: An Analyst’s Perspective suggests processing challenges where the technique is most useful. There are side-by-side code examples that show less effective, but workable coding methods alongside the following:

  • CALL SYMPUT saves values into macro variables that can later be recalled
  • PROC SQL SELECT INTO to avoid coding errors when listing variable names
  • PROC TRANSPOSE for rearranging data before performing calculations on columns
  • %MACRO %DO 1 %TO n or macro arrays to handle repetitive reads
  • PROC SQL JOIN for working with date ranges
  • three SAS character functions for subsetting an existing SAS dataset based on some specific rule
  • IF-ELSE statements that avoid unnecessary processing time

You can find this paper and more at the MWSUG 2013 Proceedings.

Share

About Author

Christina Harvey

Principal Marketing Specialist

Christina Harvey is an editor for SAS External Communications. She has more than 20 years experience as a technical writer and communications specialist for SAS.

1 Comment

  1. Barbara Rusnak, PhD on

    Wonderful to see that Anca Tilea's name is getting out there. As is evident from her work, she is a brilliant analyst! Thanks for writing the article.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top