As far as numbers go, the number zero is rather mysterious for data. Is it something or is it nothing?
What happens when you have missing data but enter 0?
This topic triggered an intriguing discussion in my recent Programming 2: Data Manipulation Techniques class. In this post I’d like to compare various industries to show you their universal take on the 0.
To start off, see if you agree with the analysts in different industries who entered zero in place of missing values below:
Can you have a 0 for blood glucose levels?
A health analyst is missing blood glucose level records for several patients and decides to enter 0 as a value. But health policy makers know that a 0 blood glucose level is highly unlikely as the patient would go into insulin shock and need immediate medical attention.
Can you have a 0% interest rate?
A customer is currently negotiating interest rates and nothing is confirmed yet. Meanwhile a mortgage specialist enters 0% with the intention to come back later and change it to a valid value. But he goes away on extended leave. What we have now is a very happy customer who scans her mortgage statements with glee thanking her stars for the day she met this banker. From the bank’s perspective however, banks might need to declare bankruptcy if they started doling money out without charging any interest.
Can you have a 0 degree Celsius temperature recording in Toronto on July 10th?
A weather specialist doesn’t have records for several cities on July 10th. She goes ahead and enters 0 for all the missing values. What she’s effectively done by entering a 0 value is to possibly mislead the general public. Just picture all the summer Toronto visitors who take this at face value and come prepared with warm clothing for a cold July!
Can you have 0 as the date a customer signed up for a home phone account?
A Telecom Sales Rep fills out a phone order for Joe Blow, very happy about completing an excellent sale that’s going to make Q2 results look terrific, but forgot to input the order date. He later realizes his mistake, but doesn’t remember the transaction date and decides to enter 0. But you know for calculations dates are stored as numbers-0 is really January 1, 1960 for SAS. When order dates are queried for the last quarter, Joe Blow would definitely not qualify in this list, making for a skewed reporting that Execs are not going to be happy about either.
With the questions posed above, I’m sure you can all understand how data wrongly stored as 0 can seriously compromise your reporting. If I were to succinctly sum up the distinction between missing and zero it would be probably look like this:
A missing value occurs when no data is available-just like the missing puzzle piece.
A Zero value occurs when data is available and it’s a valid value like the 0 degree Celsius temperature.
How can you display missing values as 0?
Ok, let’s look at flight data for rows where freight value is missing.
Now let’s modify this with an options statement so that missing values are set to 0.
The second piece of output should confirm that the missing value is just displayed as a zero but stored as missing. That’s why you still get the exact same observation returned in the second query. So this trick is a superficial one, it does nothing to change the internally stored value of missing.
Why, in the world of data can you not substitute 0 for a missing value?
Consider a human resource specialist looking at employee quarterly contributions. John Doe’s contributions look like this: $20, $0, $40, $30 – he didn’t contribute in Q2. What about Joe’s Average donations =90/4=22.5. Is that correct? I’m sure your answer is an emphatic NO as this division by four quarters throws off the count. Since he only contributed for three quarters his average needs to be calculated across three quarters only. So you would calculate 90/3=30.
Count on SAS to help with calculation of missing values. The MEAN function would be handy here, MEAN(q1,q2,q3,q4) ignores missing values in calculating the mean.
I hope this post gave you somewhat of a better appreciation of the nuances between missing and zero. John Mayer sings “Something's Missing” but doesn’t know why. I hope you do at the end of this post! I’d love to hear your comments on treatment of missing values, or anything in this blog that sparked a thought for you.