Ensuring that key variables are numeric not character

One of the frustrating outcomes of the data import process is when a variable that you need to be numeric is imported as character. This often happens because the column of data contains non-numeric data, for example, where blanks in a database are exported as “NULL” instead of a true blank. This blog presents an efficient data cleaning solution for this problem.

There are various solutions for such problems in SAS, ranging from complex coding on the import side (e.g. in SQL) to post-import cleaning. Because many users do not have access to or knowledge of all SAS products, in many cases we need to do post-import cleaning. My solution is to use the following data snippet immediately after importing. The user simply fills in only the first three lines, the name of the data set, the currently existing target variables, and a list of new variable names. Note that the user does not need to change anything else.

%let Dataset= <insert data set name here>;
%let TargetVariables = </insert><insert actual target variables here>;
%let NewNumericals = <give new variable names – list must have same number as old variables>;
Data &Dataset;
	set &Dataset;
	Length &NewNumericals 8.;
	Format &NewNumericals Best12.;
	array TargetVariables (*) &TargetNumericals;
	array NewNumericals (*) &NewNumericals;
    do i = 1 to DIM(TargetVariables);
    	if 	CountC(TargetVariables {i},"0 1 2 3 4 5 6 7 8 9 .",'vt') = 0 then 
			NewNumericals{i} = TargetVariables {i};
		else NewNumericals{i} = .;

Read More »

Post a Comment

Explaining statistical methods to the terrified & disinterested: A focus on metaphors

Many readers in applied areas (business, health, psychology & sociology, education, and several others) are reading statistics texts under duress for a course or project, and are in truth somewhere between disinterested and terrified. In my new SAS Press book Business Statistics Made Easy in SAS® I knew that I had to maximize the use of several classic pedagogical methodologies to facilitate reader understanding, while minimizing the use of mathematical development. These are all well-known and are used in many statistical texts; however, this blog mentions some of these key explanatory techniques and gives a few tips for successful use.

I believe that the key techniques for non-technical explanation are:

  1. Metaphors: we’ll cover this in more detail in this blog
  2. Pictures & diagrams
  3. Organizational cases / vignettes
  4. Storytelling

Making metaphors work

In this technique, we compare a statistical concept to some other concept in life to which readers can relate. For example, in Business Statistics Made Easy in SAS®:

  • The concept of statistical modeling is explained using star signs.
  • The difference between correlation and covariance is illustrated using an adaptation of the classic ‘butterfly effect’ from chaos theory.
  • I employ a courtroom argument to try to anchor p-values.

Read More »

Post a Comment

Who to thank for your Thanksgiving food

Here in the US, we've got the Thanksgiving holiday coming up soon. And the keystone of this holiday is a big dinner, with lots of traditional Thanksgiving foods. But where does all this food come from, and which farmers should we thank for which of the food items? I use SAS analytics to answer those questions!

Let's start with sweet potatoes... My friend Reggie is a huge fan of sweet potato pies (not pumpkin, but sweet potato). It doesn't matter whether they're homemade, store bought, or even the fast-food variety from BoJangles - they all make him happy. Here's a picture of a sweet potato pie from my friend Beth - yum, yum!


But where do sweet potatoes come from? I found a neat Web page set up to answer that question for most all the traditional Thanksgiving foods! They had a huge infographic that showed all the foods together (note that the pdf is a bit overwhelming to try to look at on a computer screen - it's more of a wall/poster thing), and they also let you look at one food item at a time. Here's a snapshot of their sweet potato map:


It's a good/informative map, but there are a few details I would change. The miles scale and north arrow are very prominent, and draw the eye ... but yet they aren't really needed in this map. The legend looks a bit odd, with the 2nd column being raised one position higher than the first column. And it's difficult to quickly determine which state is the largest producer of sweet potatoes. Read More »

Post a Comment

A short history of The Little SAS Book

This blog is co-authored by Susan J. Slaughter and Lora D. Delwiche.

SAS Press is now 25 years old. As impressive as that is, a bigger milestone for us personally is that The Little SAS® Book is now 20 years old! We had no idea back then that we would still be writing SAS books all these years later.

We first heard about the Books by Users program (now SAS Press) in 1992. At that time, the typical SAS programmer had a shelf full of fat SAS manuals. The SAS Language: Reference manual, for example, was nearly 2 inches thick and weighed four pounds!  We decided to call our book The Little SAS® Book as a joke since, at the time, the term “little SAS book” was an oxymoron.


We had each worked at help desks and had seen where beginners get stuck, and we quickly discovered a shared vision for the book. We wanted the book to be small and friendly, cover each topic in just two facing pages, contain programs that are complete and executable, use graphics to illustrate the topics, cover debugging, and avoid jargon as much as possible. As an ironic, literary touch, we also decided to put quotations at the beginning of each chapter. Read More »

Post a Comment

How to build a customized voter registration data viewer

With a major election coming next year, I was wondering if there have been any shifts & changes in the voters in my state.  This seems like an interesting opportunity for some data analysis, eh!?!

To get you into the spirit of elections, here's an "I Voted" sticker from my friend Dee:


I've lived in North Carolina most all my life, and have seen it go through lots of changes as it shifted from being a textile & tobacco state, to being the Silicon Valley of the East Coast. With cities such as Raleigh, Apex, and Cary placing highly on national "best of" lists, and companies like SAS appearing on the "best places to work" lists, lots of new people have been moving to North Carolina from other states. I'm curious how these changes might affect voting preferences, and I thought a good place to start might be the voter registration data.

After a bit of searching I found the NC voter registration data online. Luckily the data was accessible through simple URLs and html pages (rather than an interactive system that lets you view the data through an interface, but just one url to the interface). For example, here is the page for the Jan 3, 2015 data. The data was not available in a simple format such as csv, but at least it was in straightforward html tables that I could write some SAS code to scrape the data from the web pages.


Read More »

Post a Comment

You’re not using PROC REPORT?

As I visit my clients, it sometimes surprises me when they avoid the use of PROC REPORT.  “It’s too different”.  Even those that do use it, often fail to take advantage of the procedure’s power by ignoring the compute block.  Yes this procedure is different from any other. Yes using the compute block can be complex. Complex enough that someone could write a book about it. Or two.

I admit there is a learning curve, but you are already mastering SAS – why are you avoiding PROC REPORT?  Learning is an investment.  As you learn more and strengthen your REPORT skills you will ultimately save time and produce better reports.

Recently a client described a table generation process that involved the manual transferring of data summarized by PROC SUMMARY into an Excel® workbook, where it was further manipulated to produce the final table.  The process took a full day.  I had to ask; “Why not use REPORT to build the entire table?”  The response; “I already know how to do it this way.”  Well yes, but at what cost? Read More »

Post a Comment

Using more of the 1,000,000+ English words

With over 1,000,000 words in the English language, why is it that we tend to use the same words over & over? This blog shows a hierarchical approach to help you branch out and choose more descriptive words.

But first, to get you into the mood for a blog about words, here's a picture of my friend Thelma's dictionary. It's from 1929, and belonged to her Grandma Betty - this book certainly has character, eh?!?. How long has it been since you used a paper copy of a dictionary or thesaurus?



And now, on with the blog...

English teacher Kaitlin Robbs came up with a neat tool to help you traverse a hierarchical list of words related to emotional states, to come up with more specific words than just happy, sad, etc. She designed a word wheel, with the more general words at the center, and more specific words radiating towards the outer edges.

Here's a snapshot of her wheel:


This seems like a useful tool, and an interesting layout ... therefore I thought I'd try to create something similar using SAS. I first typed out all the words in a text file, and imported them into a dataset. While I was typing the words, I was suspicious that a few of them might not be spelled correctly, therefore I used SAS' Proc Spell to test them.

proc spell words='emotion_word_wheel_original.txt' verify suggest;

This check was quick & simple, and certainly worth the effort. The analysis flagged five words - of the five, the word 'disrespected' is probably OK, but the other four are definitely misspelled in Kaitlin's wheel. Here is the output of Proc Spell: Read More »

Post a Comment

Who was the first SAS user to write a SAS book?

How to get a computer toSAS Press is now 25 years old!  To commemorate this milestone, I decided to research a question that has fascinated me for years: Who was the first person outside of SAS Institute to write a book about SAS?

I first heard about this controversy at the Western Users of SAS Software conference back in 1994. That year I attended my first BBU (Books by User, now SAS Press) Authors Dinner where David Baggett, the first BBU Acquisitions Editor, mentioned the competing claims to the title of First BBU Author. Who was first depends on how one defines a book by user.

In an attempt to find the answer (or answers), I put on my investigative reporter hat to interview authors, editors, and other long-time SAS insiders. Here is what I found:

The First Official BBU Book

The first book officially published through the BBU program was Thomas Miron's SAS® Software Solutions: Basic Data Processing, published in 1993.

The First Book Published by SAS Institute but Written by a SAS User

Before the official BBU program was created, SAS Institute recruited a handful of outside authors to write books.  These notable luminaries include Ramon Littell, Rudolf Freund, David Dickey, H. W. "Barry" Merrill, and Michael Friendly. The earliest such book that I could find is SAS® System for Regression by Rudolf Freund and Staff, 1986. Read More »

Post a Comment

Analytics claim this is the 20th most used word in English writing

Analytics claim this is the 20th most used word in English writing. What word, you might ask? This word. Which one? This one right here! You might think I'm trying to lead into an Abbott & Costello-style comedy routine, but I literally mean this word ... the word 'this'! As you can see, sometimes it is difficult to use words to talk about words. Therefore, in this blog I will also use graphics!

In 1965, Mark Mayzner performed a study  on 20,000 English words randomly selected from a variety of written sources. Back then, he had to use punched cards, which was quite a labor intensive process.

Mayzner recently wrote Peter Norvig (director of research at Google) and suggested ... "perhaps your group at Google might be interested in using the computing power that is now available to significantly expand and produce such tables as I constructed some 50 years ago, but now using the Google Corpus Data, not the tiny 20,000 word sample that I used." 

Before we get too far into the Norvig's analytics, how many is 20,000 words? How about 200,000 or 2 million? How many words would you estimate are in my friend Frank's books in this picture? Food for thought...


Norvig crunched through 23GB of the Google books Ngram word count summaries, and came up with 97,565 distinct words, which were mentioned 743,842,922,321 times. He came up with the following graph, which shows summary counts of the 50 most frequently used words:


Read More »

Post a Comment

Chasing the Endless Summer

Endless Summer is both a surfing movie, and the idea that "if one had enough time and money it would be possible to follow the summer around the world, making it endless." Summer temperatures are fine if you're swimming & surfing, but I prefer slightly cooler temperatures - perhaps an Endless Spring(?) Assuming the perfect springtime day is 70 degrees, this SAS example shows where you could travel each day in the US, to have an Endless Spring ...

To get you in the mood for springtime temperatures, here's a picture from my friend David's backyard showing some of the first flowers peeking out in the spring (can you name these flowers?):


I first saw the idea for this map on justinweather.com, and then found more details on the map author's blog. He set up his animated map as a YouTube video, and I think he did a great job. You can definitely see the trends in the data. Click the screen-capture below to see Brian's animation: Read More »

Post a Comment