When I ask what a null value in the data means to your business, I get multiple answers. Some say, mathematically, it means zero. Well, if I'm doing an average I may not want to count that zero if it was input into a quantity or amount field in the database.
Some say "null" means nothing or part of "null and void." I choose to let my business customers define what "null" is to them.
Here are some questions to ask:
- If you see an attribute that has nothing in it, does that bother you and make you think the data is not good?
- Would placing a word like "unknown" in that attribute make you feel more confident in the data quality?
- What about numeric attributes like amounts, totals and quantities – should they be allowed to be null? This would not count them in a calculation that does an average. If you put a "0" in there it will count in the average calculation. Explaining this to the business user sets expectations, and gains their confidence.
The next decision that needs to be made is whether to let your ETL (extract, transform and load) routine input the value suggested (i.e. UNKNOWN) for an attribute that may be NULL, or to let the database do it for you as a DEFAULT (known as NOT NULL with DEFAULT). If given the choice, I choose to let my database work a little harder, and hold all the business decisions.