Welcome to another installment of my blog series Data Preparation and Data Quality on the Road. The main actors here are my two SAS books Data Preparation for Analytics Using SAS (DP) and Data Quality for Analytics Using SAS (DQ), who are allowed to accompany me to many conferences, SAS events, and even private journeys. This blog includes the background of some locations, SAS topics, and last but not least some pictures of my books at various places in the world.
This year, A2013 took place in London after two conferences in Copenhagen and one in Cologne. Obviously, there is great interest in analytics conferences here in Europe. Perfect location, organization and speakers—these are the ingredients that make up SAS conferences.
After promising they would behave properly, my two books, DP and DQ were allowed to come with me to London. And I have to say they did behave (as you can see from this little photo album) when walking through London, having fish and chips in a traditional pub, and at the conference when speaking to attendees after my talk.
Getting romantic? – “Missing you!”
No worries, this text will not divert into a romantic love story. But “Missing you” was the title of my presentation in London. Obviously related to the content of my books, the full title was Missing You! The Story about the Origin, Reason, Detection, Treatment and Consequences of Missing Values in Analytics. It started with some provocative statements about missing values and the way they are typically handled and included consequences and methods to profile and treat missing values in cross-sectional and longitudinal data. I felt honored that the content seemed to appeal to many attendees. I got a lot of questions after my talk and still receive email from people wanting to refer to my presentation in their work.
The story of my “Aunt Susanne”
The presentation also introduced my Aunt Susanne, who is an elderly lady that lives in the countryside. Not that I usually tend to bore people with stories about my relatives, but my aunt is a brilliant example of how missing values in the database of the telephone provider, in her case the date of birth variable, usually do not occur randomly, but rather with systematic patterns. The story is taken from my Data Quality for Analytics book (chapter 5) and illustrates how important it is to know the business and process background of your data. The case study shows that the age variable is missing for older people as they started their telephone contract some decades ago, when it was not foreseen to store the date of birth of the customer. New customers have to provide the date of birth. So we have a systematic bias in the occurrence of the missing values and thus, the missing values for date of birth should not be replaced by the mean, but rather by a business role that considers the fact that they only occur with older people.
You have to do it twice
The conference chairs introduced a new concept in the Analytics 2013 conference program. They selected some presentations to be repeated in order to make sure that attendees could see these them. Especially if there were competing presentations. I was honored that I could present twice and I convinced DP and DQ to do the presentation again in the same fresh way as the first one. The video team of SAS interviewed me about my talk. It can be seen here on YouTube.