According to research, less than half of an organization's data is structured data; nearly 80 percent is unstructured data that may come from social media, customer letters, web pages, invoices and freeform survey answers. Getting the information you need from that data can be a quick and automated experience or it can be long and painful. Alberta Parks is opting for quick and automated.
Each year, 8 million people visit the Alberta Parks. They're bird watchers. Hikers, bikers and canoers. They're searching for adventure or a romantic fire lit evening. Since 2002, Alberta Parks has been using a paper survey to learn about the experiences and expectations of those visitors.
According to Alberta Tourism, Parks and Recreation analyst Jared Prins, the structured data from the surveys is easy to analyze. But, what about all of that unstructured data - write-in comments like "Rangers need skimpier uniforms." The old way of doing this was to hand-type the comments into a spreadsheet and then convert that into a SAS data set for analysis.
This can be a long, painful process. Prins said manually coding the surveys would take him three weeks locked in his office so that he could concentrate only on the data. Now, the process is largely automated and takes just a few minutes. That is a significant time savings, but Prins says that's only part of it. Text mining gives Alberta Parks a deeper understanding of its data and provides consistency.
If you only wanted to know how many times a word appears in the data, a simple word count would do the trick. What happens when you run into homonyms or words that can have two meanings? The parsing and filter nodes process natural language. The text parsing node understands parts of speech so that it can distinguish between river bank and international bank. Look at this example:
"There was nowhere to park my car."
"This is a fantastic park to visit."
And the noise is further reduced by reducing words to their root form (stems, stemming and stemmed all become stem) and using Stop and Start lists. A Stop List can be set to ignore words such as - the, and, is.
"At Alberta Parks, we're becoming more focused on discovering actionable intelligence from our text data," he said. Text mining helps them do that. It also helps them get a deeper understanding of customer issues by making it possible to handle more open-ended questions.
Read Jared Prins' SAS Global Forum 2013 paper to learn more about how Alberta Parks is using SAS Text Miner. He also gives great ideas for setting it up for yourself. Read "Replace Manual Coding of Customer Survey Comments with Text Mining: A Story of Discovery with Text as Data in the Public Sector."