Possum Romance and Other Data Analysis Anomalies

4


Today I took my 3-year old, Elizabeth, to lunch at the on-site cafeteria. As she enjoyed some mac & cheese, we noticed a spot on the floor.

Elizabeth: What’s that, mommy?

Me: I don’t know, I think it must be where the rug got messed up and they covered it with tiles. Or maybe that’s an access point to something electrical. Or maybe it’s for decoration.

E: No, mommy, I think that when it’s nighttime, and the people are gone, that’s where the possums have their weddings. And when there are no possum weddings, the mice use that spot to make Cinderella’s dress for the ball.

How can this bit of toddlerish wisdom make you a better data analyst? Sometimes you find anomalies in the data. Things that you might not give a second thought to. Outliers, unusual combinations of variables, maybe a funky-looking standard error (how is it that big?). You might brush it off as something mundane.

It might be worthwhile to take a second look at these little anomalies. The gut reaction might be to delete outliers, but how many outliers are deleted before they send up a red flag? What is the meaning of that clump of observations that just don’t behave as the model predicts? Are those anomalies covering a stain in the rug? Or is that a possum wedding hall when you’re not looking?

I had just such a moment recently when analyzing some data for a class. We were looking at what predicts aggression in dogs, using questionnaire ratings of several thousand dogs’ trainers. There was missing data; there’s always missing data.

My initial plan was to ignore dogs with a lot of missingness. But someone in the back of my mind kept asking, what’s that? I looked closer. There were three variables consistently missing together:

• Whether the dog chases cats outside,

• Whether the dog chases squirrels outside,

• Whether the dog runs away off-leash.

A closer look revealed that these are dogs that have not been observed outside off-leash. Why? Is there a possum wedding I wasn’t invited to? Do these dogs act out against possums?

Some of these dogs are more aggressive in general, which would explain why they are not released outside—the trainer is not ready to put the dog into those circumstances. That would be an important aspect of missingness, and I’d want to think carefully about it before ignoring this group of dogs. Better to take a different approach: keep the dogs. Analyze what I do know about them.

Have you missed any possum weddings in your data lately? I’d love to hear about your experiences with anomalies that turned out to be interesting.

Share

About Author

Catherine (Cat) Truxillo

Director of Analytical Education, SAS

Catherine Truxillo, Ph.D. has written or co-written SAS training courses for advanced statistical methods, including: multivariate statistics, linear and generalized linear mixed models, multilevel models, structural equation models, imputation methods for missing data, statistical process control, design and analysis of experiments, and cluster analysis. She also teaches courses on leadership and communication in data science.

4 Comments

  1. Cat Truxillo on

    That's a great question, Dianne- in this case, the questionnaire has aggression scales that are completed by the dog's trainers, and there are a number of aggression circumstances that are addressed, including people they know, people they don't know,, dogs they know, dogs they don't know, and others. It is an established scale for assessing young service dogs-in-training. I hesitate to say a lot more because the data belongs to someone else, but your point is good, that I could delve more specifically into what kind of aggression is associated with missingness on these variables. Thanks for your comment, and for helping me think about this question!

  2. Dianne Rhodes on

    Nice article, but I'm curious as to how you define "aggression" in dogs? Are you talking about aggression towards humans or towards other dogs as they are very different. I work with animal rescue, including badly maligned pit bulls, so I care about studies like this.

Back to Top