When I ask people what they know about Denmark they often mention Hans Christian Andersen. He was born in Denmark in 1805 and is one of the most adored children’s authors of all time. Many of his fairy tales are known worldwide as they have been translated into more than 125 languages. His writing is colorful and picturesque and often with a hidden moral or criticism of society. He wanted the reader to detect the expected and discover the unexpected in his fairy tales.
In my business career I also work with detecting the expected and discovering the unexpected. I focus on health care, where Denmark is known worldwide for keeping health care data in electronic medical records (EMR). Unfortunately, reading EMRs isn’t like reading fairy tales, even though the language is both exotic – with Latin phrases – and modern, with text message jargon, medical slang, acronyms and abbreviations. The amount of text and data for doctors to manage is increasing from minute to minute, and the content is hard to consume for the clinicians during already busy days.
Highly complex language – combined with more and more laboratory analysis, X-ray descriptions, medication, guidelines, etc. – creates a situation where the clinician’s tight schedule, combined with the speed of human reading and understanding, becomes inadequate. Therefore, there is a need for advanced methods to extract value from text and data to ensure operational efficiency and reduced patient risk.
Hospital Lillebaelt Five years ago, Hospital Lillebaelt in Denmark came to the same conclusion. The amount of data was simply too large for a normal person to manage. Especially when it came to patient quality initiatives, it was an impossible task to review every patient’s data and to do it in a consistent way.
With that in mind, management at Hospital Lillebaelt started a text analytics initiative in 2010 together with SAS. As the first hospital in Denmark, Hospital Lillebaelt began a journey to discover hidden insights in the massive amount of structured and unstructured data it had. Just as H.C. Andersen was a pioneer with his colorful fairy tales, Hospital Lillebaelt was a pioneer with text analytics in the health care industry.
Innovative doctors like Chief Orthopedic Surgeon Sten Larsen, Dr.Med., Ulrik Gerdes and microbiology professor Jens Kjølseth Møller have seen the the value that text analytics can bring to their field. These solutions have a wide range of use, including determining diagnostic coding from EMR notes, automating the audit process to identify hospital adverse events in EMR notes, and uncovering which patients have a hospital-acquired infection.
Importance of transparency These solutions have more than health care in common. They all provide the clinicians with transparency in the results – a type of clinical stewardship that empowers doctors to make decisions based on all the patients’ data. There’s no black box technology. Clinicians can monitor the amount of infections, adverse events, etc., on either a hospital or ward level. They can even drill down to the actual findings on a single patient and get both the structured and unstructured data presented in a way that enables them to do fast root-cause analysis without reading pages and pages of patient information.
The simplicity, mobility and reuse of text analytics has been important from the beginning for these projects. When the projects started, we used text mining to explore the structures in the language, word frequency, abbreviations, word association, clusters and variations. This work gave us a fast and deep understanding of two years of EMR notes that we probably never would have accomplished in another way.
With the text mining approach allowing us to explore data and get an understanding of associations between specific words, we decided to switch to a Boolean categorization technique. This was to ensure full transparency in the results.
From the beginning, we decided on an approach with modules and vocabularies/word lists. Word lists containing nothing but the identified words and synonyms – no Boolean logic. This was to ensure easy editing of the vocabulary. For example, two word lists could be PAIN (pain, painful, hurts, sore, etc.) and KNEE (knee, patella, femoro-patellar, etc.). A module could then be KNEE_PAIN. A simple Boolean rule determining that knee and pain must be in the same sentence and within a distance of five words could look like this: (SENT,(DIST_5,KNEE,PAIN)). As the figure to the right indicates, the modules can become very advanced when negations, word order and time comes into play.
Regular expressions (regex) is another technique that is very convenient in many cases. In health care, this could be used to determine thresholds for fever, blood pressure, etc., and to discover drug doses. Health care business rules composed of a combination of Boolean operators, modules and word lists ensures a solution that is mobile and easy to build upon.
Simplicity and mobility
The reason that this combination and its simplicity are so important is that the treatment methods and medical slang vary from hospital to hospital in the same country. When moving a solution from one country to another (from Denmark to Sweden, for example), simple word lists are more convenient for translation. (Then a lot of other exciting differences can come into play, e.g., morphology or semantics).
These vocabularies and modules would probably never be translated into hundreds of languages, like H.C. Andersen’s fairy tales. However, this type of innovation leads to new ideas – to new innovation. In my next post, I will share how unstructured text can be transformed into something measurable that can be included in another computer science discipline – machine learning.
If you have ideas or comments about how these vocabularies and modules should be handled and versioned, you are welcome to post a comment or write directly to me.
Detect the expected and discover the unexpected!