The prescription for unstructured data analysis is now legible


My Mum could have been a doctor – most can’t read her handwriting. It’s only because I’ve been trained to read it, I can.

The analysis of unstructured data is similar. Text analysts can be quickly overwhelmed to learn that you have to manually develop a training corpus. Reading a sample of documents, and marking each document by hand – defining the relevant categories to the software.

It’s a bit easier if you already have a starter taxonomy, but it can be trying to find one specific to your need.  And even if you do find one, how do you define new concepts that you don’t even know exist in the materials? Well, you have to read a few (and are back to more manual effort).  All this to get to the point of automating categorization (sigh).prescription

There’s also the option of generating a taxonomy from reliable sources, like Wikipedia or DBPedia – and SAS® does that too. Some manual validation to ensure your document collection is addressed still has to be done.

There’s now an easier way.

SAS® Contextual Analysis is a new, highly intuitive text model development technology.  Machine learning algorithms are used to do the initial heavy lifting – removing much of the historic manual burden.  The software examines the entire collection – identifying the stems, misspellings, and more. The NLP is automatically done.

The software also automatically finds relevant topics. You can visually explore the results, adjust and refine what is discovered.

You can add concepts – the most common ones are even pre-defined for you. Or write your own.

And as the subject-matter expert, you decide what makes sense as categories – with the help of relevance metrics that the system generates.

For those of you familiar with SAS, SAS Contextual Analysis brings together some of the capabilities of SAS® Text Miner, with those of SAS® Enterprise Content Categorization – in one, guided, web interface.

Take a look for yourself. In this webinar (which features Jared Peterson, the product development manager for SAS Contextual Analysis) you can see how straight-forward it now is to get insights from all the dark (text) data you’ve not looked at (possibly because it’s been too hard?).

We’ve found that the narrative in call center notes is almost always more informative about customer issues and concerns then from the categories manually selected by call center agents.  In fact, customers will often describe how to fix the issues they perceive, giving you the prescription to make them happy.

We’ve seen this for improving fraud investigations, improving debt collection, creating more efficient operations, creating more satisfied customers, improving patient care, reducing warranty costs, delivering  relevant real-time advertising, the list goes on.

You know, it’s amazing what you can get used to.  Chronic pain from text analysis doesn’t have to be one of them.   Sure, some training will always be involved. But you can reduce the burden of manual effort and the ills of inconsistency and error in the process. And just get to the new insights sooner.

What would you want to know from your dark, text data?


About Author

Fiona McNeill

Global Product Marketing Manager at SAS

With a background in applying analytics to real-world business scenarios, McNeill focuses on the automation of analytic insight in both business and application processing. Having been at SAS for over 15 years, she has worked with organizations across a variety of industries, understanding their business and helping them derive tangible benefit from their strategic use of technology. She is coauthor of the book Heuristics in Analytics: A Practical Perspective of What Influences Our Analytical World.


  1. Pls i need specific example code + data to show me the benefit of sas text mining
    prof Babiker Malik Osman
    saudi arabia

    • Fiona McNeill

      Thank you for your interest in SAS text mining Professor Osman. As noted in this description, SAS Contextual Analysis includes aspects of SAS Text Miner, the machine learning components that eliminate the need to define a starter taxonomy. SAS Contextual Analysis also includes linguistic rules - that are not present in SAS Text Miner. A great resource for detailed case studies of SAS Text Mining (and one that delineates it from linguistic methods) is a recent book by Dr. Goutam Chakraborty et al., "Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS®" and is available from the SAS Bookstore, here: While the data isn't included, Dr. Chakraborty may have a sample for you to use. There is also some sample files included with SAS Text Miner (such as an initial stop list). Initial text data can also be any electronic documents you may have on hand. I hope this helps get you started.

      • I am a customer of SAS products, in particular Enterprise Miner/Text Miner, Sentiment analysis and Content Categorization.

        I was fortunate enough to purchase Dr. Goutam Chakraborty's book at the last SAS Global Forum and even more fortunate that he was able to sign it for me. The day I got his book, I stayed up half the night reading it. It is a fantastic book that is easy to read and provides a lot of examples.

Back to Top