Ari Juels over on the CNET Security news site wrote a fascinating article yesterday on data privacy. The basic thrust of the article is that as the world continues to push forward with adopting technologies that generate growing volumes of personally-attributable information, it will become increasingly difficult (if not impossible) to be anonymous in our day-to-day lives. I believe his predictions will prove to be fairly accurate. In a somewhat passing comment he mentions the opportunity to protect privacy in healthcare, but despite the ever-present opportunity, I don't think we are very far down the path of figuring this out.
Back in March, I wrote about the current trends in identity management. As we move slowly towards a common way of expressing an individual's online identity (e.g., OpenID, SAFE), I argued that medical and business decision making via analytics would benefit. The more I know about a patient, the more accurately I can predict the likelihood of treatment efficacy and safety. The more I know about a physician, the more likely I can detect potential patterns of fraud and abuse. These benefits are only actionable when I know that patient 2475 or Dr. 34212 is really Hugh Hardin and not John Doe. In so many areas, the potential benefits of attributable information can exceed the benefits of confidentiality. And that is the slippery slope.
Ari's article focused more on the day-to-day technologies that are gradually infiltrating our lives: movie rentals, RFID tags, online profile photos, cell phone GPS receivers, public surveillance cameras and facial recognition software. These technologies are also making their way into healthcare in the forms of electronic patient identification, mobile medical devices, personal electronic health records, electronic health diaries, tele-medicine, biometrics, and others. The difference is that many people believe they are safe with the protections offered in HIPAA, doctor-patient confidentiality, and similar concepts.
The problem is that privacy erosion often occurs in tiny steps -- the slippery slope -- each offering a tangible benefit at the time. For example, we already see smartphone applications that are able to tell you when you are physically near someone in your social network (e.g., Brightkite, Foursquare, Loopt, Blip, Ipoki, Mologogo)...great for meeting up with friends, etc. So if I'm driving past the abortion clinic or the drug rehabilitation center and my boss's profile lights up?
You may be asking yourself "do the technology and telecommunication companies really keep all that data?" It was recently discovered that Palm has been. And whereas many people think GPS transmissions are just a bunch of coordinates, it was a surprise to many to learn that alongside longitude and latitude, an application on their iPhone was transmitting the phone's serial number, user's gender, user's birth month, and user's birth year. Ari's article reminds us that 87% of the American population can be individually identified with only 3 pieces of information: zip code, gender, and date of birth. So John Doe just died.
The slippery slope that is killing John Doe starts it's incline not just in the nature of the data being disclosed, but also how easily the data is accessible. For example, public records such as land deeds and real estate transactions have been accessible to the public for a very long time. But to exploit it, you needed to get in a car, go down to the courthouse, know what you are looking for, dig around until you find it, and then find a way to copy it. Now, consider that those same public records are available online from your local county government office: I can pull up your full name, current and previous addresses, digital copies of your signatures and initials, your partner's maiden/prior name, taxes you pay...all in one system, online, searchable, for free. When I've shown this to people, their eyes get wide as they realize that privacy through obscurity is not really an option.
When present, privacy also precludes some forms of progress. We have struggled for a long time with the idea of unique-in-man drug experiences. Let's say I run 5 research studies, each with 100 patients, over 5 years in a certain community. When I collect the data, any identifiable patient information is not collected (i.e., these are anonymized clinical trials). So how many unique people received my experimental drug? It is not 500 because some patients likely enrolled in more than 1 study. So when I summarize the safety profile of my drug, I don't actually know with how many unique individuals the drug has been tested, which means my ability to statistically detect potential problems in a population as diverse as the human species is compromised. Could we develop better patient therapies if we had access to richer, attributable longitudinal health data from patients? If we had more information about patients, could analytics do a better job of selecting the right therapy for the right patient?
Science and the analytics that power it thrive on data, so yes, more is probably better (at least from where we sit today). In the same way that I can offer new types of services when I know more about a consumer, I can improve healthcare if I know more about patients, processes, and practitioners. But make no mistake about it: our current ideas around privacy are not up to the task, and there is considerable work ahead to keep John Doe alive and well.