Let’s start with a quiz. Which of the following is the Programmer’s Rule # 1?
1. Expert knowledge of multiple languages, like SAS and Java
2. Talent to maneuver with complex algorithms
3. Innate ability to draw flowcharts
4. None of the above
Dear reader, as a savvy programmer, you would have instinctively picked # 4, knowing that the most important, the most fundamental and the most critical rule for programmers is not on this list.
So what’s the answer?
On a recent yoga retreat in the Bahamas, (Yes!! This is the place to be if you are considering yoga - breathing in fresh ocean air, practicing postures where the ocean is just a sea breeze away- a dream come true for any fitness practitioner) I recalled what I had learned over years of doing and teaching yoga - that there is no one size that fits all. A forgiving practice where people of different body types, endurance and stamina come together to work-out! So what is the #1 rule for this ancient mind and body workout—KNOW THYSELF –so you listen to your body and work only with moves that fit the unique you.
That same principle applies to our teaching practice at SAS. Our mantra is KNOW OUR STUDENTS. We encourage our learners to keep practicing even if the results are not perfect just yet! While we have course material to rely on and we do have to have mastery over SAS subject matter, what is more important in our teaching practice—is to be completely in tune with our students, watching every ”ah ha!”, every understanding nod or not, and observing their learning styles so we can best teach to their listening.
Taking it to the question I posed at the beginning of the post, I’m sure you get my drift. The # 1 rule for programmers is KNOW THY DATA. Much like yoga, where posture endurance is possible only when you know yourself, the same principle holds true of SAS, —you have to know your data before you can analyze it. One of the major pitfalls users make is to jump straight into analysis before knowing what it is they want to analyze. First seek answers to questions like: What are the variables? What is the smallest value? What is the largest value? How can I find duplicates if there are unique values? Answers to these questions will help on your quest to data exploration and data analysis.
You already know that good analytics come from good data. I offer you three simple ways to follow the Programmer’s Rule # 1 of KNOW THY DATA:
1) PROC MEANS to examine the largest and smallest salary in your staff table. One great practical use is for your reporting. Now you can provide the right width for display by looking at the largest value.
2) PROC FREQ to look at unique values: Your customer master table should have only unique values. But you know your data is dirty. How can you scrub it? PROC FREQ lets you examine duplicates. But do you have the time to scroll through pages of output and rely on your eyes to spot the duplicates? Instead try this PROC FREQ step I wrote to write the customer_ids with duplicates to a table.
3) PROC UNIVARIATE to examine outliers: Unusual values are of interest or concern in data analysis as large gaps may indicate an outlier. Identify the five smallest and largest values with PROC UNIVARIATE.
Did you know that these powerful PROCS do double, sometimes triple duty to work magic on your data? Want more? Check out our Programming 1 class. I’m sure you have your own ways to tackle Programmer Rule #1, so drop me a line, I’d love to hear your thoughts.
7 Comments
You need to take part in a contest for one of the best
sites on the net. I'm going to recommend this site!
Thanks Ward, good idea to create a format to capture dirty data values--Formats are a powerful way to display data differently without having to recode a column. Thanks for sharing.
And for coded data, or at least where practical, a format that has valid values with display of 'In code List' or similar and "Unknown' or similar other wise. Then look for the formatted value "Unknown" to find coding errors.
What a great visual tip! thanks for sharing Rick.
You bet, Tim! Its just 3 hours away .. already planning my next trip! btw there's Val Morin in Quebec if you want a canadian yoga vacation!
Since the max and min of a variable are shown in the UNIVARIATE output, you can actually combine Tips (1) and (3). But since Three Tips are better than Two Tips, I'll add a new Tip: Use the histogram statement in PROC UNIVARIATE to see a graph of the distribution of the data. For example:
ods graphics on;
histogram salary / kernel;
The Sivananda Ashram, by chance? It's a great place to get away!