Three ways to boost text analysis results

In my work at SAS, I meet many people intent on using text analytics to help their organization achieve the next big breakthrough in competitive advantage. The most successful of those do three things well.

#1 – Listen with a business goal in mind

Leading organizations understand the value of tapping into unstructured data sources. They’re monitoring social media, analyzing customer reviews, collecting surveys, capturing call center notes, and much more. If you’re not doing this, you should. And if you are already analyzing the voice of your customers, consider how your text analytics projects are aligned with your business goals.

Text analytics empowers businesses to achieve key goals. Here are some areas where I see text analytics applied most often:

  • Understand and identify topics within thousands of comments, tweets, blog posts
  • Model and segment customers by integrating unstructured and structured data
  • Pinpoint how your customers feel about your brand, products, and services (and how those compare to your competitor’s brands)
  • Automatically categorize content, extract entities, and disambiguate text
  • Manage and link jargon across business units, companies, or industries

To stay competitive, it's essential for businesses to leverage this new technology. There’s no time like the present to plan for improvement. What are your critical business goals for 2012 and what are your customers saying about your organization that affects those goals?

#2 – Capture high quality conversation

As we all know, there’s high quality conversation and there’s casual chat. Typically, the latter adds no real business value and is a source of “noise.” This probably isn’t a big deal in your social life but, in business, such noise is a distraction and a waste of time and money. The same holds true when you’re capturing online conversations, news, and information.

Natural language processing is making it easier to filter and disambiguate relevant content by identifying the part-of-speech (ie. noun, verb, adjective) and by stemming terms (ie. sell, sells, selling, sold). But language is too complex to simply filter on a single term or phrase, rather it’s becoming necessary to filter and analyze the context of words to gain a deeper understanding in order to extract high quality content from conversations.

Having access to a wealth of information is great, but the foremost challenge facing businesses is knowing what information to capture, where to find it, and how to filter it for pertinent information. The goal is to focus on relevant, high-quality content that informs and enables organizations to take strategic action.

#3 – Turn intelligence into action

Intelligence is worthless if it doesn’t enable organizations to take action. The volume of information available to the average businesses can be overwhelming. My team and I work with leading organizations to extract high-quality conversations and transform the resulting insights into action. Goals include:

  • Identify and respond to unhappy customers on social media
  • Trace potential fraudulent activity within survey comments, insurance claims, and social media
  • Gain a competitive advantage by monitoring the online reputation and that of competitors
  • Automatically cluster and categorize call center logs to identify high-volume issues for employee training
  • Monitor and forecast sentiment prior to and during a product launch
  • Identify emerging issues before they become costly problems

Software technology does the heavy lifting.  Text analytics software delivers the key intelligence that organizations need to gain a competitive advantage. How individuals leverage the resulting insights, and the actions they take, help them determine their goals for future text analysis, the types of conversations they’ll examine next time, and which actions are likely to effect the change they desire.

Will that improve your text analysis results?  I continually see SAS users in organizations drawing on that cycle of continuous learning to expand competitive advantage.

Comment or email me with your recent challenges and successes.

Post a Comment

Text analytics: my fantasy football MVP

I love fantasy football. It has nothing to do with liking or understanding football (still baffled by what a tight end does after three years), but fantasy football is awesome. I can download data and write models for it (as I did in one league). Or I can just leave it to auto-draft (as I did in the other two).

Text Miner concept links

Text Miner concept links

Models, however, don't predict spinal injuries well. As my opponent put it over this weekend, "I want to beat you fair and square, not because you've got half your team out with ACL injuries!" That team had Jamaal Charles, and in another I managed to pull Peyton Manning as my QB. Because of these injuries I've been struggling to find replacement players for all but the kicker position.

One of the things that's popping up this year is the use of text to enhance the models. I noticed in one of my leagues that there's an option to look up the most recent Twitter posts about players. In another I can read the blog posts on the individual games and players. My backup for Manning, for example, Kyle Orton, at least seemed to have some positive Twitter sentiment. This makes him seem worth keeping when my other option is Rex Grossman (full disclaimer: I am a Redskins fan, but I'm also a realist).

Text Miner results for NFL tight ends

Text Miner results for NFL tight ends

 What's the value of this text-based data? More information faster about injuries and other potential factors that aren't accounted for in the rushing and passing stats. While it's easy enough to find data on football players, it usually doesn't come with a note on whether they'll be benched for being in jail during the season opener (drafted that guy, too, but since getting out he's been OK).

For example, in one league I need a new tight end – Aaron Hernandez is out and Brent Celek is underperforming by any estimate, especially compared to when I had him two years ago. I used some SAS code to scan the tweets for "tight end" over a set of days, then extracted the list of player names through our entity extraction tool. From there I was able to look at which players were mentioned the most, then see if they were available for me to pick up or trade for in my league. I could have used SAS Sentiment Analysis to further see which comments were good or bad, but since I've only got a few TEs to look over I can do that myself in the interface.

 So who will help lead the Fightin' Midgets to victory this week? Probably not Chris Cooley (I love my Redskins, but no). I'll keep monitoring Twitter all week to see who should be picked up. Maybe I can get Jason Witten (probably not). Jerome Shockey’s at least available, if not the best option.

Post a Comment

The semantic web is here. Is your organization ready?

I’m in between summer holiday breaks, clearing the decks and getting caught up.  Two items that have kept me thinking after the workdays are done were the recent Kent State Online Open House presentation on “Semantic Technologies and Knowledge Management: What is it and why should we care,” and the August 2011 BeyeNETWORK article “Criticizing ‘Decision Theory’ – The Underlying Problem with Best Practices”

We’ve learned to speak computer, now they need to learn to understand human. Graphic art from Kent State Online Open House.

In the Open House webinar, Denise Bedford brought together cybertrends, Web 3.0, knowledge-driven business processes and computable knowledge (i.e. how we can make knowledge computationally possible by translating human knowledge in a way that computers can understand). We’ve learned to speak computer, now they need to learn to understand human.

Based on her observations, Denise assured the audience that the semantic web is already here – clarifying that the semantic web is not just reflective of the www but of intranets as well. She taught me a new term: axiology, the study of value, and also the pinnacle of reasoning capacity and metadata/content/knowledge representation (according to cited work of Mills Davis). 

Just around the same time, I read Frank Buytendijk’s BeyeNETWORK article which aptly concluded that “best practices per definition cannot be ’best’ because they need to be different.”  Now my background in decision theory is quite tangential to those mentioned in this article, being more from the process side. The point being that we can measure the unknown, or at least estimate it, from three sources of variation – namely heterogeneity (individual differences), nonstationarity (environmental influences) and state dependence (the dependence on time), each of which has both observed and unobserved components.  We can measure these factors, and estimate the unobserved – with known limits to those estimates.

My premise is that, we can model decisions, even with uncertainty. If the semantic web is already here – and it is gradually moving toward its natural, axiology realization - then how can we articulate the distinctive advantage of computable knowledge for any one organization? Not the best practice, but the innovation enabler that puts them at the top of their market. 

Text analytics works well with others

As always, it’s the analytics and, in this case, the intelligence gleaned from the text.  Deriving the meaning (semantics) and all the relative associations from the analysis of the text is the first step. We begin by mimicking the mind’s interpretation, and coding that for computers to process with little or no human intervention.. Building on that, the distinctive value will be in the application of those semantic codifications and the analysis of the interactions to inform, improve and predict activity and behavior.

Web 3.0 has enabled people and machines to connect, evolve, share and use knowledge. Looking even further ahead, with Web 4.0 wherein we have a self-learning intelligence, the distinctive advantage will come from the combination of semantic technologies, like text analytics, along with other analytical models that extend semantic interoperability. In other words, having feedback loops for improving models – utilizing both semantic representations along with those from areas such as data mining, forecasting, optimization, simulations, and alike.  Using these technologies, organizations will create that higher-order learning that did not exist using any one of those methods in isolation.

Improving internal processes with text analytics

Products have advanced to the point where you don't need to understand programming in order to design and implement a semantic solution. To get started, identify which internal processes could be improved with the automation and organizational learning that text analytics provides. Examples of such processes range from performance appraisal feedback analysis to modeling terrorist threats. By starting with one or two processes, you’ll not only achieve some of the more immediate benefits, but you’ll start creating measurable value by freeing resources to work on higher order (more strategic) projects.  And do this now instead of waiting to see what everyone else does as ‘best’, because there won’t be distinctive competence in that.  Initiate the necessary transition to realizing the value that will put you ahead of the pack, and prepare you for what is to come.

If you’ve not already started to explore the realities of semantic value, please let me know the challenges that hold you back. Understanding of the problem is one of the most important criteria for success.

Which internal processes could you improve with text analysis?

Post a Comment

This changes everything! Really?

Contributed by Tom Reamy, Kaps Group

As Chief Knowledge Architect and founder of KAPS Group, I lead a group of knowledge architecture, taxonomy, and text analytics consultants. This is my third year attending and second year as a speaker at the Semantic Technology Conference (#SemTech on Twitter).

Presentation themes revolve around evolution and maturity. The evolution theme is that there are no really big, revolutionary changes being touted this year. I like that, as I’m a little tired of and skeptical of “This Changes Everything!” mentality. The biggest buzz is about Schema.org – more about that later. Another evolutionary theme can be seen in the size of the conference – more people are attending this year than last.

The maturity theme provided most of the excitement instead of major new technologies or breakthroughs, the excitement was generated by real life applications with real business value. Three keynote talks were good examples of this:

  • The first application was developed by Amdocs, a telecom provider, to add intelligence to customer support software.
  • The second was the BBC World Cup web site that used a variety of Semantic and ontology elements to drive the web site.
  • The third was a DoD presentation that was very slick – a video introduction with all sorts of special effects and a discussion of what they are up to – and they have more of everything than anyone else in the world.

The Amdocs example has a personal and a SAS connection. I did a proof-of-concept (POC) for Amdocs last year around choosing a text analytics solution to add to their intelligence platform. The outcome of that POC was that they chose SAS Enterprise Content Categorization and SAS Sentiment Analysis products.

At the conference, I gave a “Text Analytics Evaluation” talk which presented and argued for the approach that I used with the Amdocs project as well as other, earlier projects. So, the standing room only crowd (OK, it wasn’t a huge room) got to hear how to evaluate text analytics software in general and look into a case study in which SAS Text Analytics was clearly superior to any of the 30 other vendors, including IBM’s offerings.

It was heartening to see so many people attend a text analytics talk at a Semantic Web/Ontology focused conference. More people seem open to the idea that combining ontologies and text analytics gives you the best of both worlds. For a man with 20 years in information architecture, enterprise search, intranet management and text analytics consulting, that’s very good news.

Post a Comment

Predicting Changes in Language - Collocations and Mimetics

The science of mimetics (or knowyourmeme.com) provides a framework for understanding and potentially predicting changes in the structures of our world including the structure of the language we use.

Collocations are words that frequently occur together, often in a particular fashion that is subject to the linguistic rules of the language of expression. Collocations often reflect the evolution of a language. For example, “high rise” becomes “high-rise” which in turn becomes “highrise” or even “HighRise”. In English composition the hyphen is often used in a transitionary phase as two separate words progress mimetically to become one word.

There are often unobserved and often strict rules in the development of acceptable collocations in English. For example, we accept “High Rise” but would not accept “Tall Rise”. On the other hand, we would understand a “Tall Drink” but would view a “High Drink” in a completely different fashion.

SAS Text Miner has a collocation detection mechanism built into its operation. The identification of Noun Phrases and proper nouns in the text mining linguistic parser is optimized to identify acceptable collocations, as discussed above. The concept link diagram is much more free wheeling: here collocations are identified simply on the basis of a calculation that computes the likelihood that the two identified terms will occur together. This likelihood calculation is a function of the number of times the words occur together.

It is essential to calculate the joint probability of two words occurring together since mimetically-inspired word combinations will only occur in specific conditions. Another way of saying this is: if the co-occurrence of two words is not sufficiently unusual it is highly unlikely that it is a salient term combination.

We used SAS Text Miner to look at the Vaccine Adverse Events Reporting data. Here we observed that the terms “Tylenol” and “Fever” have a salient collocation probability: In this example, the numbers show that the term Tylenol occurs in 173 documents with the term fever. When you point to these terms, the fever term at the center shows 1620/1620 while the term Tylenol connected to fever shows 173/399. This indicates that the term Tylenol is highly associated with the term fever. The term Tylenol appears with the term fever in 173 of the 399 documents containing the term Tylenol. This is a strong enough collocation to cause the two terms to be displayed in the concept link diagram that illustrates collocations in the Text Miner interface.

The Text Miner software is being exceptionally clever here: neither "Tylenol fever" nor "Fever Tylenol" would likely appear as acceptable noun groups from a parsing point of view. Yet Tylenol and Fever could easily evolve mimetically just as “Xeroxing” and “photocopying” have evolved mimetically (due to the strong marketing of the Xerox brand).

The Text Miner documentation describes the calculation of the collocation probability used in the concept link diagram as follows: For a given pair of terms, say a and b, the strength of association between these terms is computed using the binomial distribution.

Let
u be the number of documents containing term a
v be the number of documents in the collection
n be the number of documents containing term b
k be the number of documents containing both term a and term b

p = u/v be the probability that term a will occur when term b occurs assuming that they are independent of each other.

More information on collocation is given in Wikipedia.

A number of references, produced below, are cited on the Wikipedia link.

References
Dunning, T. (1993): "Accurate methods for the statistics of surprise and coincidence". Computational Linguistics 19, 1 (Mar. 1993), 61-74.
Gledhill C. (2000): Collocations in Science Writing, Narr, Tübingen
Firth J.R. (1957): Papers in Linguistics 1934–1951. Oxford: Oxford University Press.
Sinclair J. (1996): “The Search for Units of Meaning”, in Textus, IX, 75–106.
Smadja F. A & McKeown, K. R. (1990): “Automatically extracting and representing collocations for language generation”, Proceedings of ACL’90, 252–259, Pittsburgh, Pennsylvania.
Hunston S. & Francis G. (2000): Pattern Grammar — A Corpus-Driven Approach to the Lexical Grammar of English, Amsterdam, John Benjamins
Hausmann F. J. (1989): Le dictionnaire de collocations. In Hausmann F.J., Reichmann O., Wiegand H.E., Zgusta L.(eds), Wörterbücher : ein internationales Handbuch zur Lexicographie. Dictionaries. Dictionnaires. Berlin/New-York : De Gruyter. 1010-1019.
Moon R. (1998): Fixed Expressions and Idioms, a Corpus-Based Approach. Oxford, Oxford University Press.
Frath P. & Gledhill C. (2005): “Free-Range Clusters or Frozen Chunks? Reference as a Defining Criterion for Linguistic Units,” in Recherches anglaises et Nord-américaines, vol. 38 :25–43

Post a Comment

Who Cares About Sentiment?

~ Contributed by Tim Trussell ~

It was a great experience attending the Text Analytics Symposium last week in Boston, and hear presentations focusing on the directions of Text Analytics. It was clear to see, the themes for text analysis to operate within predictive modeling efforts and overall business analytics integration. This was very supportive of the material I shared – of what constitutes successful text analytics - and in my opinion matches what SAS has been saying is critical to organizational text analytics success all along.

This brings me to my presentation on “Getting the Most out of your Marketing Campaign: Integrating Sentiment with Traditional Marketing”. While the delivery was brief as I tried to get everyone to lunch on time, I received a very engaged crowd as I put up a standard %positive sentiment graph and asked the audience “Who cares?”

This was one of the central points to the presentation which highlights that sentiment on its own does not bring actionable business value. So why do it.

Instead of just measuring sentiment, I proposed the things you can evaluate to get benefit from sentiment analysis, by assessing the:

  • Subject of sentiment (customer service, price, product features).
  • Influence of the Authors (both with respect to weighted # of followers, or other metrics, as well as value).
  • Trend of Sentiment (change in subject sentiment relative to overall sentiment).

The net result of understanding projected impact from sentiment analysis is that traditional marketing can start to leverage individual or aggregate measures to improve good old fashioned campaign targeting – often a marketers bread and butter. And even extend improved campaign targeting with sentiment measures, and best interact with customers through something like SAS Real Time Decision Manager engine.

Some remaining thoughts stemming from conversations with attendees included: how can we best get individualized records for our database from forums that need to remain anonymous; and at what point does a company following individual sentiment move from good targeting to stalking.

While I continue to ponder these questions, I hope to continue to see advancement in the adding of true analytics to text analysis. Analytics 2011 on October 24-25 in Orlando allows a forum for more discussion. In the meantime, please comment here to share your experiences in reaping the rewards from text analytics.

Post a Comment

Predicting American Idol results based on Twitter sentiment

Last week I participated in the Expanding Your Horizons conference through the Women's Initiative Network, where the goal was to get middle school girls excited about math and science. When I signed up, I said my session would explain forecasting, specifically on the pop culture phenomena of American Idol.

I don't actually watch the show, and I don't know much about it other than Simon Cowell is no longer a judge. I assumed I could find the weekly voting results online and use those numbers to forecast each remaining contestant's chances of staying on the show. I learned only after the fact that American Idol doesn't actually release these results -- so what was I supposed to use to explain statistics to kids?

While there may not be numeric data on American Idol, there is a plethora of text data on the show and about the singers available on Twitter. The only problem now was: How do I turn all these Tweets into something about math and science? Using text analytics, of course!

These middle school girls got a lesson on how computers can convert things they read every day into meaningful statistics using approaches similar to what they're already learning in math class. (So even their English courses can, theoretically, involve math.)

Generating the scores was easy enough with SAS Sentiment Analysis. I used a SAS macro to copy two weeks of Tweets about the final six contestants. From there, I took an existing Sentiment project (about an actual business), and copied over the list of generic positive and negative keywords. I ran the model and then added in a few extra rules to account for the nuances of Twitter-speak (turns out most Twitter-speak consists of shortenings of "love" and "sucks"). If I'd had a longer frame of time, I would have taken the Tweets and broken them up by day, trying to see if there were any singers who were improving over time and accounting for that trend. If I knew anything about the genre of music each singer prefers, I would have added that in as a variable to account for the popularity of certain kinds of music.

Sentiment scores for American Idol contestants

I suspect a similar problem comes up frequently for business users: there's a question you're interested in answering but have limited structured data to answer those questions with. However, the sources for textual data abound, both internally and externally for companies.

Businesses can use textual analysis to help create some structured data even when it appears there isn't any available on the question at hand. Once that structure is found, then an analyst can use the standard tools for forecasting, prediction, and optimization to make informed decisions.

Oh, and for those who are only reading this to find out some watercooler predictions about pop culture: Scotty and James will make it to the end, and James appears more likely to win. Lauren will be the next to go (and no one seems all that interested in her anyway). And sorry Haley, but even though you seem to be getting better, you just aren't good enough to be the next American Idol.

Post a Comment

Feeling good about sentiment

The Sentiment Analysis Symposium (#sas11) in NY this week was even better than last year.

Focusing on practical uses of sentiment analysis, as this field continues to mature, was the overriding theme. The presentations and attendee comments stimulated great conversations.

The show’s customer-experience focus was equaled by financial investment firm interests represented in the audience. Both groups need to measure value from the analysis of sentiment.

Attendees and speakers confirmed that sentiment is not always impactful to an organization – nor its cousins of emotion, trust, feelings, and alike. How someone feels does not always affect what they do – but when it has an important impact, it is worthwhile to analyze it. Having done historic work in buyer behavior, there seems to be a number of parallels to the classic notions of utility theory in this – and I’m still working through the meaning of the ‘order of the good’ in today’s buying environments.

If I were to classify the over-arching types of applications discussed, it (perhaps to no surprise) seems to align with one item that I shared with the audience – that sentiment analysis can be used for:

  • Validating
  • Learning
  • Changing


Validating data with Sentiment Analysis, often with other text analytics capabilities as well - can help inform and identify the accuracy of information that has been structured, say by the selection of a code identifying the severity of a problem heard by a call center representative. Going through this exercise can help ensure that the decisions based on reported "high priority items" are in fact, the most important issued to be focused on. Putting that into production can include scoring news feeds, web crawls and internal file systems scours with the sentiment models (applying any additional information, say through categorization), and updating search systems to improve relevancy retrieval.

Learning from sentiment analysis - many of the symposium talks described how new learning is gleaned by applying sentiment analysis. But as @KDPaine pointed out – it is important to focus on the so what factor. The reports are informative, but how do you make this information actionable? By integrating with the technologies that you may already have (dashboards, office productivity tools, adding to data stores as new elements or surfacing in mobile devices, etc.), you can share this new insight. In a production context, you can modify activities based on this learning by integrating it with customer conversation centers, campaign management systems or other relevant workflow applications.

Changing behavior with sentiment analysis - from a proactive standpoint is possible with the inclusion of the results from sentiment analysis into analytic models. It is only through the analytic application that you can change the behavior yet to occur, as you first must have the information on what will most likely happen before you can modify what you do. When sentiment affects that behavior, it is a key ingredient in reaping significant value from sentiment analysis.

I confess, the other reason for writing this is to finish saying what I didn’t in my symposium talk: that yes, Stevie Wonder said it best “We all have the ability. The difference is in how we use it.” {And thus the album reference in the bottom corner in the slide above}. The life of sentiment analysis in creating a competitive advantage is in how you use it differently than your competitors (as they can scrape the same web sites that you can for their sentiment monitoring).

Finally, another big Thank You to @SethGrimes for chairing the Sentiment Analysis Symposium. I hope to see you all at the Text Analytics Summit in May!

Till then, I’m interested in ways you measure the value of sentiment analysis in your organization. Write and tell me your most surprising finding.

Post a Comment

From Influence to Persuasion

I have a big prediction to make about text analytics. Maybe it's not that big, but it's something I've been putting a lot of thought into.

The next big thing in text analytics is going to be Persuasion Analysis, which will use Sentiment Analysis in a totally different way.

Persuasion Analysis will move away from pure sentiment analysis and move to a more effective means of understanding how certain words and phrases can influence people, “to influence the influencers.”

Let’s take a look at how this might work.

  • Measured Persuasion is how many people that were converted or persuaded to engage or buy your product, this could be a Net Lift Modeling measurement or some other technique
  • Perceived Persuasion is what is perceived to be true, based on external measurement.
  •  

  • Lines represent the interaction between each of the elements
  •  

 

You want to know what is really cool? Virtually everything in the Persuasion Diagram is represented at Predictive Analytics World, Conversion Conference and Emetrics Optimization Summit.

If you are going to Predictive Analytics World or any of the other conferences, I hope to see you there.

Please share your comments. How will influence measurements mature over the next few years? Will persuasion be a better measurement than influence?

Post a Comment

Text analytics is finding its way to the boardroom‪‪

All types of businesses and government agencies are taking advantage of information buried in previously untapped text documents (surveys, product reviews, on-line forums, emails, instant messages, articles, and social media).

Customers I work with are interested in protecting their brand equity, increasing customer satisfaction and loyalty, and reducing risks by carefully guarding their reputation. Top executives attempt to manage all kinds of risk within their organizations. Many of these risks are financial (credit risk, market risk, liquidity risk, insurance risk), but others are non-financial.

Reputational risk is one of the highest non-financial risks identified in the “Global Risk Management Survey – Sixth Edition” executed by Deloitte in 2009. That survey identified that 81 percent of the respondents were attempting to manage reputational risk.

A strong reputation can be a key competitive advantage. How many really understand what customers, analysts, and key opinion leaders are saying about them, their products and services so they can address issues? A good reputation can take decades to build, but it can be ruined very quickly. The actions taken directly after an incident occurs are the key to whether those involved effectively protect their reputation.‪ Creating an on-going dialogue with your customers or stakeholders to listen and learn from their feedback, in good and bad times, will help you maintain your good reputation. This dialogue can enable you to communicate with customers and prospects at a significantly lower cost than traditional marketing -- and with increased speed and effectiveness. It can also enable a more rapid response to perceived customer issues and competitive threats.‪

Executives responsible for managing their organization's reputation must answer some tough questions.

  • What are the issues that people are discussing about my business compared to the top competitors in my industry?
  • Are my marketing messages resonating with my customers?
  • Am I spending my marketing dollars on the right channel to reach my target audience?
  • What actions should be taken to improve my products, services, or employees based on the feedback I am hearing?
  • What effect will the feedback have on my brand and my reputation in the marketplace?

Your stakeholders have spoken to all of these areas but are you listening? The answers to these questions are too often buried in massive volumes of stored text. Text is a largely unused asset in many organizations, but it offers far reaching impact on sales and marketing, customer service, and product development.

It's time for text analytics to give voice to ways for managing reputational risk. Unlike any other technology, text analytics can help organizations interpret, summarize, and report on information contained in virtual mounds of stored documents.

Are you ready to uncover sentiments; positive or negative opinions, attitudes, or feelings that could affect your reputation and ultimately translate into financial impacts? ‪

Let's bring the voice of the market into the boardroom‪‪.

Post a Comment