I am unabashedly a data zealot, but not a data scientist. I believe in the power of data and data analytics, and that the data revolution has started. It will transform all our lives in ways we cannot even imagine. This is one of the things that attracted me to SAS.
I'm also a Harry Potter nerd and not afraid to own it. That’s how I know what a Harry Potter Dilemma is when it comes to data. It was a lesson I learned through my youngest daughter, Allison.
Data is a wonderful thing, but without analysis, it has little meaning or use. Data analytics provides insight necessary to transform data into intelligence. This intelligence can be used for almost anything – driving strategic and tactical decisions for improving science, healthcare, manufacturing, banking, education, government and more. The possibilities are limitless. Good data analysis is critical for fighting fraud.
The Harry Potter Dilemma in data analysis
One day when my daughter Allison was in first grade, my wife got a call from her school. Her teacher, Mrs. Myers, requested we come in for a parent-teacher chat about her academic performance. This was surprising.
I have two grown daughters now who are both incredibly bright. Allison is highly competitive and was always working to meet or surpass her older sister Ashley. So, it was odd to get called to the school for either of our daughters.
We made the meeting with Mrs. Myers. She started off by praising Allison and letting us know she loved having her in class. Then, the teacher went into her data problem, which had to do with work groups. Each major academic area had work groups with various levels, and students were placed in the work group level according to their test scores.
Allison was in the top level for Math, Science, and Social Studies. However, she was only one level above the bottom in Reading. Mrs. Myers pointed out that this was impossible. A student could not place so highly in the other areas without being able to read.
This became known as the Harry Potter Dilemma: a data conflict.
One type of data conflict occurs when some data do not agree with the remaining observations of variable values. Thus, the user knows something is “wrong” with the data which needs to be resolved. For example, a coin toss might result in heads or tails, but never both, and is unlikely to be neither (to land standing on its edge).
Is this data outlier something meaningful or a flaw in the process?
Resolving the data conflict
We explained to Allison what her teacher had shared, and we asked her about the difference in her work group placements. We pointed out that we knew she could read at a high level. Plus, her vocabulary was well developed. So, her reading group level didn’t really seem to make sense.
Allison started crying, and between heaving sobs, she replied, “I don’t want you to stop reading to me.”
We were caught off guard by this and her strong, heartbreaking, emotional reaction. When Allison calmed down a little bit, we asked her to explain why she thought we wouldn’t read to her anymore.
“You don’t read to Sissy anymore. Ashley can read, and you don’t read to her anymore!”
Allison started to cry again. We pointed out that she is four years younger than her sister. Her sister didn’t want us to read to her anymore. That’s why we stopped.
Allison had taken valid data by observation, but she had drawn the wrong conclusion from her data. She read tea leaves as well as Professor Trelawney.
The root cause of the Harry Potter dilemma
We found the root cause of our data conflict with just a couple of questions.
At our house, we read to the children almost every night until our older daughter decided otherwise. If you have kids, you likely know that once they find a story that they like, they will request that same book over and over. This is less exciting for the parents.
We had run out of books for Allison, and wanted to move to chapter books. Her sister had gotten “Harry Potter and the Sorcerer’s Stone” by J. K. Rowling as a present. Ashley read one chapter and discarded it. Since the book was handy, we decided to give it a try with Allison. Every night we would read a chapter or two at bedtime. Both my wife and I attended the nightly reading sessions. We loved the book and eventually the series.
Our family reading group liked Harry Potter so much that we needed to get in more reading time. New books were coming out, and we wanted to catch up before their release. So, we created “After Dinner Harry.” You can guess when this took place.
Allison loved these Harry Potter reading sessions so much, that she was pretending she couldn’t read in school, fearing this would all stop if we knew she could read. It was her favorite time, and I confess it was ours as well - even her older sister joined the reading group. Allison purposely skewed her data results, creating the conflict.
We agreed to keep reading to Allison as long as she liked, especially Harry Potter. “After Dinner Harry” became a family tradition, and our family went on to read the whole series together and watch all the movies.
Three weeks later, Mrs. Myers reported Allison was now at the highest level in Reading. The data conflict was resolved – insight – intelligence – no more outlier for Reading placement.
If you have a data conflict between variable observations, you have a Harry Potter Dilemma.Find out who's changing the world with data