My Mum could have been a doctor – most can’t read her handwriting. It’s only because I’ve been trained to read it, I can.
The analysis of unstructured data is similar. Text analysts can be quickly overwhelmed to learn that you have to manually develop a training corpus. Reading a sample of documents, and marking each document by hand – defining the relevant categories to the software.
It’s a bit easier if you already have a starter taxonomy, but it can be trying to find one specific to your need. And even if you do find one, how do you define new concepts that you don’t even know exist in the materials? Well, you have to read a few (and are back to more manual effort). All this to get to the point of automating categorization (sigh).
There’s also the option of generating a taxonomy from reliable sources, like Wikipedia or DBPedia – and SAS® does that too. Some manual validation to ensure your document collection is addressed still has to be done.
There’s now an easier way.
SAS® Contextual Analysis is a new, highly intuitive text model development technology. Machine learning algorithms are used to do the initial heavy lifting – removing much of the historic manual burden. The software examines the entire collection – identifying the stems, misspellings, and more. The NLP is automatically done.
The software also automatically finds relevant topics. You can visually explore the results, adjust and refine what is discovered.
You can add concepts – the most common ones are even pre-defined for you. Or write your own.
And as the subject-matter expert, you decide what makes sense as categories – with the help of relevance metrics that the system generates.
For those of you familiar with SAS, SAS Contextual Analysis brings together some of the capabilities of SAS® Text Miner, with those of SAS® Enterprise Content Categorization – in one, guided, web interface.
Take a look for yourself. In this webinar (which features Jared Peterson, the product development manager for SAS Contextual Analysis) you can see how straight-forward it now is to get insights from all the dark (text) data you’ve not looked at (possibly because it’s been too hard?).
We’ve found that the narrative in call center notes is almost always more informative about customer issues and concerns then from the categories manually selected by call center agents. In fact, customers will often describe how to fix the issues they perceive, giving you the prescription to make them happy.
We’ve seen this for improving fraud investigations, improving debt collection, creating more efficient operations, creating more satisfied customers, improving patient care, reducing warranty costs, delivering relevant real-time advertising, the list goes on.
You know, it’s amazing what you can get used to. Chronic pain from text analysis doesn’t have to be one of them. Sure, some training will always be involved. But you can reduce the burden of manual effort and the ills of inconsistency and error in the process. And just get to the new insights sooner.
What would you want to know from your dark, text data?