The Death of John Doe

Ari Juels over on the CNET Security news site wrote a fascinating article yesterday on data privacy.  The basic thrust of the article is that as the world continues to push forward with adopting technologies that generate growing volumes of personally-attributable information, it will become increasingly difficult (if not impossible) to be anonymous in our day-to-day lives.  I believe his predictions will prove to be fairly accurate.  In a somewhat passing comment he mentions the opportunity to protect privacy in healthcare, but despite the ever-present opportunity, I don't think we are very far down the path of figuring this out.

Back in March, I wrote about the current trends in identity management.  As we move slowly towards a common way of expressing an individual's online identity (e.g., OpenID, SAFE), I argued that medical and business decision making via analytics would benefit.  The more I know about a patient, the more accurately I can predict the likelihood of treatment efficacy and safety.  The more I know about a physician, the more likely I can detect potential patterns of fraud and abuse.  These benefits are only actionable when I know that patient 2475 or Dr. 34212 is really Hugh Hardin and not John Doe.  In so many areas, the potential benefits of attributable information can exceed the benefits of confidentiality.  And that is the slippery slope.

Ari's article focused more on the day-to-day technologies that are gradually infiltrating our lives: movie rentals, RFID tags, online profile photos, cell phone GPS receivers, public surveillance cameras and facial recognition software.  These technologies are also making their way into healthcare in the forms of electronic patient identification, mobile medical devices, personal electronic health records, electronic health diaries, tele-medicine, biometrics, and others.  The difference is that many people believe they are safe with the protections offered in HIPAA, doctor-patient confidentiality, and similar concepts.

The problem is that privacy erosion often occurs in tiny steps -- the slippery slope -- each offering a tangible benefit at the time.  For example, we already see smartphone applications that are able to tell you when you are physically near someone in your social network (e.g., Brightkite, Foursquare, LooptBlip, Ipoki, Mologogo)...great for meeting up with friends, etc.  So if I'm driving past the abortion clinic or the drug rehabilitation center and my boss's profile lights up?

You may be asking yourself "do the technology and telecommunication companies really keep all that data?"  It was recently discovered that Palm has been.  And whereas many people think GPS transmissions are just a bunch of coordinates, it was a surprise to many to learn that alongside longitude and latitude, an application on their iPhone was transmitting the phone's serial number, user's gender, user's birth month, and user's birth year.  Ari's article reminds us that 87% of the American population can be individually identified with only 3 pieces of information: zip code, gender, and date of birth.  So John Doe just died.

The slippery slope that is killing John Doe starts it's incline not just in the nature of the data being disclosed, but also how easily the data is accessible.  For example, public records such as land deeds and real estate transactions have been accessible to the public for a very long time.  But to exploit it, you needed to get in a car, go down to the courthouse, know what you are looking for, dig around until you find it, and then find a way to copy it.  Now, consider that those same public records are available online from your local county government office: I can pull up your full name, current and previous addresses, digital copies of your signatures and initials, your partner's maiden/prior name, taxes you pay...all in one system, online, searchable, for free.  When I've shown this to people, their eyes get wide as they realize that privacy through obscurity is not really an option.

When present, privacy also precludes some forms of progress.  We have struggled for a long time with the idea of unique-in-man drug experiences.  Let's say I run 5 research studies, each with 100 patients, over 5 years in a certain community.  When I collect the data, any identifiable patient information is not collected (i.e., these are anonymized clinical trials).  So how many unique people received my experimental drug?  It is not 500 because some patients likely enrolled in more than 1 study.  So when I summarize the safety profile of my drug, I don't actually know with how many unique individuals the drug has been tested, which means my ability to statistically detect potential problems in a population as diverse as the human species is compromised.  Could we develop better patient therapies if we had access to richer, attributable longitudinal health data from patients?  If we had more information about patients, could analytics do a better job of selecting the right therapy for the right patient?

Science and the analytics that power it thrive on data, so yes, more is probably better (at least from where we sit today).  In the same way that I can offer new types of services when I know more about a consumer, I can improve healthcare if I know more about patients, processes, and practitioners.  But make no mistake about it: our current ideas around privacy are not up to the task, and there is considerable work ahead to keep John Doe alive and well.


  1. Tammi Kay George
    Posted August 20, 2009 at 10:05 am | Permalink

    Thanks, Jason, for this great post. I always enjoy your insights - and one phrase that rang (among many) was "privacy through obscurity is not really an option."
    My area of focus has been cross-industry BI, focusing on the presentation layer that helps to convey the right info to the right person at the right time. Recently having switched my focus to the analytics spectrum there have been new issues to consider - and this article hit on concerns that are bubbling up related to other industries as well. It is even changing the perception the public has regarding certain techniques - eg data mining, segmentation, profiling, etc.
    Your post is very timely - in part because the data is there and is growing. The type of data, the volume of data, the ease of use in accessing data and it will be used. No doubt the data will be used.
    Just a lot to think about - thanks for this post! I think we are going to miss John Doe more than we know.

  2. Mark Wolff
    Posted August 20, 2009 at 1:44 pm | Permalink

    There is a rather infamous quote from the founder and current Chairman of the Board (then CEO) at Sun Microsystems regarding the internet and privacy. Amazingly the quote is from 1999!
    Wired Magazine - 01.26.99
    Polly Sprenger
    The chief executive officer of Sun Microsystems said Monday that consumer privacy issues are a "red herring."
    "You have zero privacy anyway," Scott McNealy told a group of reporters and analysts Monday night at an event to launch his company's new Jini technology.
    "Get over it."
    McNealy's comments came only hours after competitor Intel (INTC) reversed course under pressure and disabled identification features in its forthcoming Pentium III chip.

  3. Allison Lane
    Posted August 20, 2009 at 2:04 pm | Permalink

    Indeed, a lot of frightening repercussions to consider. In Jason's blog post, I was waiting for (hoping for) an answer or magic bullet. No doubt, SAS has a solution in the works. But, what can the individual do today?

  4. Jason
    Posted August 20, 2009 at 2:45 pm | Permalink

    Thanks Tammi, and fair question, Allison!
    This was a tricky blog post to write, and it has been interesting to hear the mix of reactions from people on it. It is a tough topic to talk about because on the one hand you don't want to be scaring people, and on the other hand you want them to understand the significance of the issue.
    One fairly obvious thing an individual can do is be more diligent in monitoring your electronic identities, such as checking your credit score each year. I have averaged some credit-related finding about once every two years now, and I would be surprised if I am unique. On Monday, Mashable ran an article on a guy who stole 130 million credit card numbers. . It pays to be diligent.
    But another big one that is really important is to make sure you actually read the terms of use and contracts for anything that you use. What is the company telling you about what it will do with your data? In many cases (included some of those I mention), the company disclosed what they were doing, but no one read it.
    One thing I don't think is realistic is to try and limit your electronic fingerprints; this is the new world, we will have to adapt.

  5. Jason
    Posted August 20, 2009 at 2:46 pm | Permalink

    Great reminder Mark, I remember that well!

  6. Charles Uzzell
    Posted August 20, 2009 at 4:15 pm | Permalink

    Nice article, thank you. Genealogists have been concerned about much of this for many years. However, I note with interest that the "three pieces of information" are now changed. It used to be any two of these four: name, address, SS#, and mother's maiden name. A lot of people are still weird about their "home phone number" but that long ago was easily obtained. The mother's maiden name is still difficult to find (for genealogists), but is rarely used for security anymore.

  7. Harlan Shays
    Posted August 21, 2009 at 11:38 am | Permalink

    Allison, I think there are clues in Jason’s great blog entry about what individuals can do if they are concerned about maintaining privacy. In the example of the 5 research studies each with 100 patients, we see that study participants can unintentionally skew the results by participating in more than one study. Advocates for individual privacy have similar (albeit calculated) opportunities for skewing the data away from reality.
    The most expensive alternative is to have multiple smart devices that are aligned with different providers, so no provider has a complete picture. Once it was impossible to imagine that Americans would have more than one car or TV or credit card per household. A device that sits dormant for most of the week or two thirds of the year can’t provide a terribly useful profile.
    Conversely, one smart device per family, used by household members on an as-needed basis, will provide data about a user with impossibly erratic consumer and travel behavior.
    One wonders how long it will be before organizations (businesses or community groups) are created with a mission to provide smart device “privacy firewalls.” These could be an option for people who are ready to give up thinking of smart devices as personal gear that delivers individual service. Other models are possible. For example, while most of us distinguish public transit from private transportation, there are communities that have set up ways to rent cars by the hour or use public bikes for free. They aren’t for everybody, but either is privacy….
    You can’t hide, and you may not be able to completely pollute the data about you and your loved ones. But with some extra expense or effort, there are likely to be many ways of making that data far less useful.

  8. Jason
    Posted September 9, 2009 at 1:56 pm | Permalink

    For those interested, a related story ran in Ars Technica yesterday on this topic:

3 Trackbacks

  1. By Clouding the Issues - A Shot in the Arm on January 13, 2012 at 3:36 pm

    [...] there still be problems?  Undoubtedly, especially in areas related to data privacy (see my post a couple of weeks ago).  But it's not a cloud computing [...]

  2. By Keeping Up - A Shot in the Arm on January 13, 2012 at 3:44 pm

    [...] response prediction) could be tremendous.  Of course, the risk here is loss of privacy (see my blog post from a few weeks ago).  And as I wrote back in March, identity management standards such as OpenID [...]

  3. By Users For Sale - A Shot in the Arm on January 13, 2012 at 5:10 pm

    [...] Twitter messages and 1 billion user relationships, and is now offering the data for sale.  I wrote several weeks ago about the disturbing loss of data privacy we are seeing, and this case is an excellent example.  [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

  • About this blog

    Welcome to the SAS Health and Life Sciences blog. We explore how the health care ecosystem – providers, payers, pharmaceutical firms, regulators and consumers – can collaboratively use information and analytics to transform health quality, cost and outcomes.
  • Subscribe to this blog

    Enter your email address:

    Other subscription options

  • Archives